Chunk size for reading writing netcdfs

WilcoT · September 28, 2021, 8:43pm

Hi,

Is there any best practice for choosing chunk sizes when:

Using xr.open_mfdataset to open 20 years of daily netcdf files (7305 files in total, leading to a dataset with dimensions: time: 7305, x: 1224, y: 1090). Dataset has two variables.
Doing some resampling over dimension time, e.g. monthly means
Writing the entire dataset from step 1 back to disk as one netcdf file (instead of 7305 individual files)
Writing the result from step 2 to disk as one netcdf file

The data is of dtype float32. All this is run on a simple standalone PC with 16 GB of RAM. If I don’t specify the chunk size it takes about 23 minutes to write step 3 to disk, which I find rather time consuming. The netcdf file size of step 3 is about 36 GB.

Any advice on how this can done best?

Topic		Replies	Views
How to be more efficient with subsetting data from multiple netcdf files? Data	4	1275	June 9, 2022
Processing large (too large for memory) xarray datasets, and writing to netcdf Science	12	7259	December 12, 2024
Memory requirements tor converting a netcdf multifile dataset to zarr Data	3	839	May 18, 2022
Using grib2 files with `open_mfdataset`: is there a better workflow than converting to netcdf?	4	1377	July 27, 2022
Technical question about reading file by chunks and Xarray/Dask/NetCDF Data	5	1866	August 30, 2022

Chunk size for reading writing netcdfs

Related topics