CMIP6 Zarr datasets on AWS — useful for interactive exploration?

Hi there,

I’m working on the development of new, interactive ways to exploit cloud-based Zarr datasets, and obviously got very excited when a lot of CMIP6 datasets were released on Amazon (AWS) in that format (cmip6-pds bucket).

However, looking closely into the ASDI CMIP6 buckets, I’ve seen that the datasets are only chunked across the time dimension and are quite large (10-100 MB). This makes fast, interactive analysis very hard, since (as far as I know) obtaining a time series for a single location would require downloading all chunks, even in Python with xarray; from JavaScript, it would be even more of a show-stopper.

Maybe we’re missing something? We thought about range header requests (à la COG), but (1) I’m not sure they’re supported in Zarr or Zarr libraries; and (2) for geographic subsetting (say we want a small AOI) we would still be sending a lot of requests (one for each lat coordinate, since they won’t be contiguously stored in the file).

Any ideas? Thanks in advance!

1 Like

Thanks for this interesting and useful question @guigrpa – and welcome to the forum!

I’ll try to write a detailed response within a few days. In the meantime, this other post contains many points that are relevant to your question: