Synchronizer for Zarr + Dask on Kubernetes

sharkinsspatial · December 13, 2021, 7:28pm

Hi @Leonard_Strnad I believe you work now with my awesome former colleague Martha, welcome to the discourse. There may be 2 higher level questions here that might be worth considering before committing to building a large archive. The first is what is the ideal representation model for the underlying data. Recently @TomAugspurger has begun thinking about alternative, efficient representations for sparse EO data https://discourse.pangeo.io/t/tables-x-arrays-and-rasters/1945 It is definitely worthwhile reviewing his thread as there is still significant flux in the community around how to approach sparse datasets.

The second question concerns the underlying data storage mechanism. If your collection of Tiffs exists in GCS, STAC can provide a mechanism to reference and access the underlying bytes without the need to ingest them into a Zarr archive. As you noted there is ongoing work on GitHub - gjoseph92/stackstac: Turn a STAC catalog into a dask-based xarray to improve the interoperability between STAC and xarray.

The community has not yet embraced a single solution for this question. Decision factors include the spatial and temporal distribution of your data, the storage access patterns of your use case and the mutability of your data collection over time. These are questions that lots of teams are grappling with at the moment so it would be great to keep an open dialogue going around this in this thread (or another if that makes more sense) :]

Topic		Replies	Views
Xarray to Zarr Parallel Writes with Dask Distributed Data	8	3695	July 26, 2022
Writing to lat lon regions with to_zarr(region=) Data	10	2042	January 15, 2022
Saving larger-than-memory objects to zarr using dask and xarray Data zarr	9	575	December 3, 2024
Cloud array storage solutions Data	3	1187	November 29, 2023
Advice on writing many slices from one remote zarr xarray to another Data	4	561	January 15, 2022

Synchronizer for Zarr + Dask on Kubernetes

Related topics