Cloud-optimized access to Sentinel-2 JPEG2000

This sounds like potentially an ideal use can for the “virtual zarr” approach.

A weak definition of “cloud optimized”

Yes, this is a good definition, which fits well with the article I wrote about the topic last week :slight_smile:

We have performed simple benchmarks (see GitHub - Kayrros/sentinel-2-jp2-tlm), and it shows the enabling the option for future products would make Sentinel-2 imagery “cloud native”, with similar performances as COG or Zarr, while requiring only a minor change to the format.

Awesome! That’s very similar in spirit to the “cloud-optimized HDF” work.

I believe it should be possible to make a virtual datacube of Sentinel-2, exploiting the data in JPEG2000 on CDSE with no modification + the TLM indexes. I’d love to get some feedback on this.

Yes. If the individual files are already “cloud-optimized” at rest, the advantage of creating a virtual zarr store pointing at the data are

  • downstream applications can access the data through the general-purpose zarr API (and therefore through xarray.open_zarr)
  • the entire dataset can be addressed as a single massive datacube, rather than users having to deal with large numbers of individual filepaths.

I would love to help you with this. The two things to understand first are:

  1. Are there any other properties of the Sentinel-2 data which would make it hard to map to Zarr (see FAQ — VirtualiZarr)?
  2. Has anyone written a VirtualiZarr/Kerchunk reader for JPEG yet? If not then you might want to look at the VirtualiZarr reader for HuggingFace’s SafeTensors format as an example.
1 Like