Is there any best practice for choosing chunk sizes when:
- Using xr.open_mfdataset to open 20 years of daily netcdf files (7305 files in total, leading to a dataset with dimensions: time: 7305, x: 1224, y: 1090). Dataset has two variables.
- Doing some resampling over dimension time, e.g. monthly means
- Writing the entire dataset from step 1 back to disk as one netcdf file (instead of 7305 individual files)
- Writing the result from step 2 to disk as one netcdf file
The data is of dtype float32. All this is run on a simple standalone PC with 16 GB of RAM. If I don’t specify the chunk size it takes about 23 minutes to write step 3 to disk, which I find rather time consuming. The netcdf file size of step 3 is about 36 GB.
Any advice on how this can done best?