Any suggestion how to avoid memory error when using rechunker?

nikoojua · February 2, 2023, 3:19pm

Hello,

We want to rechunk a zarr file with dimetions Dimensions: time : 57 lat : 21600 lon : 43200 from the chunk size of 5400 into 16 using Dask cluster. Our data sits at S3 bukcet. Unfortunately we are facing multiple issues which we could not resolve so far:

Error with the Dask cluster workers dying because Out of Memory
Reshaping errors due to incorrect “max_mem”
Reached S3 rate limits with >10 workers

We could do some successful tests using DEM data at 1km resolution, however it took us 43 minutes with 10 workers to finish the tasks.
Rechunking 2m Temperature data at 25 km with the same setup took us 9 min.
So far no luck at Rechunking downscaled 2m Temperature (1km) due to issues mentioned above.
We were wondering if there is any way to decrease run time in case of both 1km and 25 km (Although the most important one would be 1km).

Topic		Replies	Views
Rechunker removes time dimension information	2	435	January 21, 2023
Optimising Access For Zarr on S3 Data by LAT/LONG (Dask) Data	11	1643	April 25, 2022
Rechunking large data at constant memory in Dask [experimental] HPC	9	1943	June 5, 2024
Optimizing Dask worker memory for writing Zarr files from GeoTIFs Data	7	233	September 7, 2024
Am I thinking about this data processing/chunking workflow correctly? Data	8	1061	June 9, 2023

Any suggestion how to avoid memory error when using rechunker?

Related topics