Best practice advice on parallel processing a suite of zarr files with Dask and xarray

dcherian · June 17, 2025, 5:12pm

Going back to your original question

I suppose my question comes down to what are good strategies to use once the data become too large

As this example illustrates, it is often a good idea to think about what your analysis is doing and how it lines up with the data layout (chunking). Sadly though, this can take a bunch of digging through various layers of code, so it’s not the most user-friendly.

Topic		Replies	Views
Zarr era5 reading causes huge number of tasks Cloud	9	1419	September 22, 2021
Feedback on Zarr performance benchmarking HPC	1	1176	July 16, 2020
xr.DataArray.chunks, np.digitize and xr.DataArray.groupby, and dask Science	2	682	January 16, 2022
Xarray unable to allocate memory, How to "size up" problem Data location-uw	9	3166	July 27, 2023
Xarray to Zarr Parallel Writes with Dask Distributed Data	8	3880	July 26, 2022

Best practice advice on parallel processing a suite of zarr files with Dask and xarray

Related topics