Hi there,
I’m working on the development of new, interactive ways to exploit cloud-based Zarr datasets, and obviously got very excited when a lot of CMIP6 datasets were released on Amazon (AWS) in that format (cmip6-pds bucket).
However, looking closely into the ASDI CMIP6 buckets, I’ve seen that the datasets are only chunked across the time dimension and are quite large (10-100 MB). This makes fast, interactive analysis very hard, since (as far as I know) obtaining a time series for a single location would require downloading all chunks, even in Python with xarray; from JavaScript, it would be even more of a show-stopper.
Maybe we’re missing something? We thought about range header requests (à la COG), but (1) I’m not sure they’re supported in Zarr or Zarr libraries; and (2) for geographic subsetting (say we want a small AOI) we would still be sending a lot of requests (one for each lat coordinate, since they won’t be contiguously stored in the file).
Any ideas? Thanks in advance!