Okay, it’s been some time I wanted to ask this question and get some feedbacks, and this has come up in recent discussions here, so let’s go!
On our imagery production projects (at CNES, French space agency), we often loop on this question: to which format should we write our products? This is basically CoG vs Zarr, with sometimes NetCDF. There is probably no good unique answer…
Some advantages and drawbacks I have in mind so far:
- CoG: Well understood by remote sensing community. One file per band, chunked in each file. No parallel writes inside a file, parallel reading. Overviews (nice for visualizations).
- Zarr: More all around. One file per band and chunk (too many files?). Parallel writes and reads. No overviews or by multiplying files?
- Zipped Zarr: Solution to the too many files problem (that can be heavy on infrastructure), solution to how to download one product too, but I feel it is not nice.
- NetCDF: Well understood by many communities. One file per product. Parallel reading using kerchunk?
As said elswhere, (Zipped) Zarr will be the next Sentinel 2 format. I feel there is no strong consensus and good choice on the field of file format for a collection of remote sensing products.
Maybe GeoZarr will change the game here, or Zarr v3 with sub chunks I heard about?
Maybe we got it all wrong and in the future should think more broadly as a collection, a unique Zarr store for every products?