Best practice advice on parallel processing a suite of zarr files with Dask and xarray

:flexed_biceps:

Going back to your original question

I suppose my question comes down to what are good strategies to use once the data become too large

As this example illustrates, it is often a good idea to think about what your analysis is doing and how it lines up with the data layout (chunking). Sadly though, this can take a bunch of digging through various layers of code, so it’s not the most user-friendly.

1 Like