Xarray unable to allocate memory, How to "size up" problem

Hi Axel! Thanks for sharing! Your frustration is certainly not unique. I think our community is guilty of making it seem like these tools are a bit magic and can just work at any scale with little user intervention. That’s clearly not the case.

It’s worth noting that the problem you have picked–calculating the daily climatology from a high-resolution dataset–is notoriously challenging. Just to benchmark your system, I’d start first by doing something simpler like

ds.mean(dim="time")

just to see if it works. If that doesn’t work, the climatology never will.

Then you might want to read this thread, which at the end gets to some practical solutions to your problem.

Finally, I would note that code like this

ds=xr.open_mfdataset(path,chunks={"time":366*nyears,"latitude":200,"longitude":200})

is potentially wishful thinking, There are underlying physical chunks on disk in the netCDF4 files, and if you tell Dask to use chunks that are not aligned with those, performance could be worse, not better. So you should seek to understand what are the chunks on disk first.

1 Like