Super large Zarr array limits

Hello,

We plan to generate a super large data array (lat, lon, time) from the 56.000 Sentinel-2 tiles. The longitude variable would hold about 3 million coordinates and about 2 billion chunks (per year).

  • What would be the metadata size for that ? Is there an explicit mapping to each chunk, or is it computed based on index?
  • Coordinates size would be around 8 MB ?
  • How fast would be a query resolved ? Is Zarr-python efficient with such large index ?

Thank you very much for your ideas.

Would like to hear your experience.

An alternative to a large zarr (I.e. same store) is a directory full of zarrs (or netcdf) and working with kerchunk/combine.py at main · fsspec/kerchunk · GitHub