Processing large (too large for memory) xarray datasets, and writing to netcdf

rabernat · September 3, 2021, 12:25pm

An alternative approach, if you know the total number of timesteps in advance, would be to initialize the whole Zarr store at the beginning with a lazy “template” dataset, i.e.

ds_template.to_zarr(tempdir, compute=False)

and then use the region argument to write, i.e. something like

for j in range(ntimesteps):
    dd.to_zarr(tempdir, region={'time': slice(j, j+1)}

This might be faster because it won’t actually open the Zarr store with Xarray each time. That’s what I think is happening with append.

Topic		Replies	Views
Netcdf to Zarr best practices Data	13	10441	February 10, 2021
Extremely slow xarray/zarr writes Data	5	583	August 22, 2024
Best practice for memory management to iteratively write a large dataset with xarray Data	0	1376	December 4, 2021
Memory requirements tor converting a netcdf multifile dataset to zarr Data	3	840	May 18, 2022
Extremly slow write to S3 bucket with xarray.Dataset.to_zarr Data	32	4968	December 6, 2023

Processing large (too large for memory) xarray datasets, and writing to netcdf

Related topics