Using to_zarr(region=) and extending the time dimension?

You can either use region= or extend along a single dimension with to_zarr(), but you can’t do both at the same time.

You can of course edit the Zarr metadata yourself, but personally I like to stick with Xarray for this sort of thing:

  1. Create a lazy Xarray dataset in Dask the size of the entire result. I would typically do this with some combination of indexing, xarray.zeros_likes and xarray.concat/expand_dims on a single time slice, like in this example from Xarray-Beam.
  2. Write the Zarr metadata using to_zarr() with compute=False
  3. Write each chunk using region= from a separate processes.

These notes may also be helpful. Beam is just a convenient way to map over many tasks – the same pattern also works for other distributed engines.

5 Likes