tl;dr there is a large push in Github for Zarr to support sparse arrays, but nothing has been fully implemented yet. I want to propose this topic for discussion as it seems like a meaningful contribution to both Zarr and geoscientific modeling.
Context:
Zarr does a good job at storing chunked arrays to disk and has an easy to use interface. However, there is no support for reading/writing sparse arrays using the Sparse API. From scanning the Zarr Github issues, this is the only implementation of sparse support that I have found: Converting sparse matrices directly to persistent zarr arrays · Issue #152 · zarr-developers/zarr-python · GitHub which was created in 2017.
There is a large issue: Adding sparse array support · Issue #245 · zarr-developers/zarr-specs · GitHub which references several other issues and proposes a prototype, but has not seen an update since July of 2023.
It seems Dask has sparse support: Sparse Arrays — Dask documentation as long as your sparse array follows several requirements (see article for more)
Goals:
- Start a discussion around an implementation of storing a
sparse.coo
matrix within Zarr - Make some cool code that can be made into a PR