Best practices to go from 1000s of netcdf files to analyses on a HPC cluster?

geynard · May 8, 2020, 10:36am

Exacty, that could be the biggest pain point with Tom’s strategy, it might prevent it to work in some HPC facilities.

This point seems really important to me, it would be worth spending some time to see if something could be done.

Could we think of something intermediate between the two approaches : some sort of iterative rechunking. Say if you’ve got to rechunk over a year, first you rechunk by month, save the output, and then start from there to rechunk along the full year reducing again spatial dimension of the chunks? Could this lead to graphs a little simpler and easier to execute?

Topic		Replies	Views
Extremely slow rechunking of Zarr store with xarray Data	16	4128	October 22, 2021
Zarr era5 reading causes huge number of tasks Cloud	9	1411	September 22, 2021
Strange issue with .compute() of Dask array values Science	15	929	October 17, 2020
Reading larger than memory HDF data and writing concatenated xarray (or Zarr) dataset on HPC Data	13	2421	October 8, 2020
Question on DASK efficiency Data	21	1120	April 29, 2022

Best practices to go from 1000s of netcdf files to analyses on a HPC cluster?

Related topics