Best Practices for Storing EURO-CORDEX CMIP6 Datasets on S3 (Zarr, Icechunk, Kerchunk)

larsbuntemeyer · September 9, 2025, 7:43pm

Thanks a lot Tom! Yes, our former approach was actullay very similar to the CMIP6 leap feedstock and it basically stored each dataset id, e.g., cordex.output.EUR-11.SMHI.MPI-M-MPI-ESM-LR.rcp85.SMHI-RCA4.r1i1p1.day.tas.v20180817, as a single zarr store. As far as i understood also from reading through the discussions(e.g., Welcome, I need some support for the design of a forecast archive with Zarr), it’s not the ideal approach concerning performance if i often want to open and merge several datasets. Usually, i did that using an intake catalog search, open all datasets and merging them. But for example, in the ERA5 ARCO dataset, i get all surface variables in one dataset/zarr store which is very convenient and i would aim for something similar like that instead of storing each variable in a separate zarr store. To get an ensemble view, i could create a virtual dataset (e.g., all tas from all models) that simply references the existing data. Would this be a good approach, that also allows me to update that virtual ensembel dataset when new models arrive? I think this option would be

D. per-source + frequency Zarr stores + virtual ensemble

Store all variables for one source_id and one frequency (e.g., daily, monthly) in a single Zarr store.
Then build a virtual ensemble dataset (VirtualiZarr / kerchunk / Icechunk) across source_id.

So i’m probably trying to leverage the “Put as much as you can into a single Zarr group / Xarray dataset” recommendation and “be flexible with extending the ensemble” ideas!

Topic		Replies	Views
CMIP6 Zarr datasets on AWS — useful for interactive exploration? Data	1	934	June 10, 2021
Storing CMIP6 data on JASMIN's object store Cloud	16	1765	August 21, 2020
Recommendation for hosting cloud-optimized data Data	15	2863	January 21, 2022
Availability of AWS S3 CMIP6 data? Data	6	1178	February 24, 2021
Extremely slow rechunking of Zarr store with xarray Data	16	4131	October 22, 2021

Best Practices for Storing EURO-CORDEX CMIP6 Datasets on S3 (Zarr, Icechunk, Kerchunk)

Related topics