Yeah the core problem is in flox — a simple ds.resample(time="YS").sum().compute() raises a warning about a 300MB graph. It’s embedding data that is duplicated a lot due to the small chunk size.
Since you are calculating annual statistics, and your input chunk sizes are small (30MB), let’s rechunk it (you should do this in your open_dataset call). You can use Xarray’s new-ish TimeResampler objects to rechunk to a frequency (slick!)
from xarray.groupers import TimeResampler
ds.chunk(time=TimeResampler("10YS")) # aim for 200-300MB
Once you do this, the problem is embarassingly parallel. Your previous chunk size (468) did not match the “yearly” frequency, so there was some inter-block communication required.
For a long time, I’ve wanted to figure out automated heuristics for this but haven’t had time to do so. (hah, I even started prototyping some automated rechunking here but never finished)