Best practices to go from 1000s of netcdf files to analyses on a HPC cluster?

Exacty, that could be the biggest pain point with Tom’s strategy, it might prevent it to work in some HPC facilities.

This point seems really important to me, it would be worth spending some time to see if something could be done.

Could we think of something intermediate between the two approaches : some sort of iterative rechunking. Say if you’ve got to rechunk over a year, first you rechunk by month, save the output, and then start from there to rechunk along the full year reducing again spatial dimension of the chunks? Could this lead to graphs a little simpler and easier to execute?

1 Like