Xarray unable to allocate memory, How to "size up" problem

rabernat · March 6, 2023, 5:40pm

Hi Axel! Thanks for sharing! Your frustration is certainly not unique. I think our community is guilty of making it seem like these tools are a bit magic and can just work at any scale with little user intervention. That’s clearly not the case.

It’s worth noting that the problem you have picked–calculating the daily climatology from a high-resolution dataset–is notoriously challenging. Just to benchmark your system, I’d start first by doing something simpler like

ds.mean(dim="time")

just to see if it works. If that doesn’t work, the climatology never will.

Then you might want to read this thread, which at the end gets to some practical solutions to your problem.

Finally, I would note that code like this

ds=xr.open_mfdataset(path,chunks={"time":366*nyears,"latitude":200,"longitude":200})

is potentially wishful thinking, There are underlying physical chunks on disk in the netCDF4 files, and if you tell Dask to use chunks that are not aligned with those, performance could be worse, not better. So you should seek to understand what are the chunks on disk first.

Topic		Replies	Views
Optimizing climatology calculation with Xarray and Dask Science	33	3960	December 6, 2024
Saving larger-than-memory objects to zarr using dask and xarray Data zarr	9	569	December 3, 2024
Xarray/Dask - Specify that a given task use huge amount of RAM to the Dask Ressource Manager Science	2	600	October 19, 2022
Optimizing Dask worker memory for writing Zarr files from GeoTIFs Data	7	235	September 7, 2024
Struggling with large dataset loading/reading using xarray Science	39	15197	February 16, 2023

Xarray unable to allocate memory, How to "size up" problem

Related topics