Any suggestion how to avoid memory error when using rechunker?

Hello,

We want to rechunk a zarr file with dimetions Dimensions: time : 57 lat : 21600 lon : 43200 from the chunk size of 5400 into 16 using Dask cluster. Our data sits at S3 bukcet. Unfortunately we are facing multiple issues which we could not resolve so far:

  • Error with the Dask cluster workers dying because Out of Memory
  • Reshaping errors due to incorrect “max_mem”
  • Reached S3 rate limits with >10 workers

We could do some successful tests using DEM data at 1km resolution, however it took us 43 minutes with 10 workers to finish the tasks.
Rechunking 2m Temperature data at 25 km with the same setup took us 9 min.
So far no luck at Rechunking downscaled 2m Temperature (1km) due to issues mentioned above.
We were wondering if there is any way to decrease run time in case of both 1km and 25 km (Although the most important one would be 1km).

1 Like