Hi Axel! Thanks for sharing! Your frustration is certainly not unique. I think our community is guilty of making it seem like these tools are a bit magic and can just work at any scale with little user intervention. That’s clearly not the case.
It’s worth noting that the problem you have picked–calculating the daily climatology from a high-resolution dataset–is notoriously challenging. Just to benchmark your system, I’d start first by doing something simpler like
ds.mean(dim="time")
just to see if it works. If that doesn’t work, the climatology never will.
Then you might want to read this thread, which at the end gets to some practical solutions to your problem.
Finally, I would note that code like this
ds=xr.open_mfdataset(path,chunks={"time":366*nyears,"latitude":200,"longitude":200})
is potentially wishful thinking, There are underlying physical chunks on disk in the netCDF4 files, and if you tell Dask to use chunks that are not aligned with those, performance could be worse, not better. So you should seek to understand what are the chunks on disk first.