Sparse Array storage in Zarr

tl;dr there is a large push in Github for Zarr to support sparse arrays, but nothing has been fully implemented yet. I want to propose this topic for discussion as it seems like a meaningful contribution to both Zarr and geoscientific modeling.

Context:

Zarr does a good job at storing chunked arrays to disk and has an easy to use interface. However, there is no support for reading/writing sparse arrays using the Sparse API. From scanning the Zarr Github issues, this is the only implementation of sparse support that I have found: Converting sparse matrices directly to persistent zarr arrays · Issue #152 · zarr-developers/zarr-python · GitHub which was created in 2017.

There is a large issue: Adding sparse array support · Issue #245 · zarr-developers/zarr-specs · GitHub which references several other issues and proposes a prototype, but has not seen an update since July of 2023.

It seems Dask has sparse support: Sparse Arrays — Dask documentation as long as your sparse array follows several requirements (see article for more)

Goals:

  • Start a discussion around an implementation of storing a sparse.coo matrix within Zarr
  • Make some cool code that can be made into a PR
1 Like

If I were you I would not be afraid to resurrect this issue and have a go at the (helpful, concrete) suggestions made in that thread!

Thanks for the confidence boost. I just made a comment on the larger issue #245, so we’ll see where it goes. It looks like a lot of the stuff to build the support is there. It’s just a question of if people would want a new core function, a convention (not sure what that one is), etc.

1 Like

there’s also a sparse array working group under the scientific python umbrella (mostly organized on their discord, I believe?) with monthly meetings, though a couple of people from that group are participating in the discussion on the zarr issue above.

1 Like

Thank you @keewis for the suggestion to join the discord. I talked with one of the Zarr contributors (d-v-b) and they suggested to implement a codec which supports sparse.

Here is some codec docs. I’ll start putting something together this weekend.

1 Like