Changing the default setting for "auto" chunk size parameter in Dask

Patrick_Hoefler · August 27, 2024, 5:21pm

Hi,

We are thinking about changing the value that is behind the “auto” chunk size configuration by default. This has historically been at 128MiB for a very long time now. (You can stop reading if you are not using “auto” for your chunks).

NumPy is pretty fast on 128MiB arrays, which oftentimes means that the scheduler is a bottleneck because each individual task is too short. Choosing a larger chunk size will alleviate a lot of the pressure that is on the scheduler.

Additionally, choosing a bigger value here will directly correlate to the size of the graph for folks who are using the auto setting. The graph will get significantly smaller and thus easier to handle for all parties involved.

TLDR: A bigger value for the default auto chunk size should translate to smaller graphs and less strain on the scheduler. It makes large scale computations easier to handle and gives better performance.

Generally, the memory usage per worker will be a bit higher with bigger chunks in this scenario. This is nothing that will make workloads fail if they ran fine before. It’s only something to be concerned about if you deploy Dask in a way that is significantly different than 4GB of memory per Core, i.e. 2GB per thread or less.

This post is intended to get feedback, so please speak up if you have thoughts on this either here or on the associated GitHub issue Better chunk size value for chunks=auto setting · Issue #11342 · dask/dask · GitHub

Greetings

Topic		Replies	Views
Optimal object size in the cloud Cloud	3	683	January 6, 2022
Optimizing Dask worker memory for writing Zarr files from GeoTIFs Data	7	241	September 7, 2024
Fixed width ASCII 500,000 columns to Zarr?	6	881	July 19, 2021
Xarray/Dask - Specify that a given task use huge amount of RAM to the Dask Ressource Manager Science	2	601	October 19, 2022
Very big memory load when using fast parallel file system HPC	11	1801	November 13, 2019

Changing the default setting for "auto" chunk size parameter in Dask

Related topics