Pangeo CMIP6 Catalog

christophrenkl · January 19, 2022, 12:42pm

Hello everyone,

I am using intake-esm and the Pangeo CMIP6 catalog to access CMIP6 model output which is working great so far. So, first off, thanks to everyone involved creating these tools!

I have a question about the completeness of the catalog with respect to the data on the ESGF server. Specifically, I am interested in the output of the HadGEM3-GC31-MM model and I am looking for, among others, data of surface downwelling longwave radiation (output variable name rlds). On the ESGF server, I can find 4 datasets (for 4 ensemble members) with daily and montly output, and one dataset with 3-hourly output. Unfortunately, I am not able to locate the daily and datasets in the Pangeo catalog. The following query results in an empty data frame.

import intake

url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)

# query
query = dict(
    activity_id="CMIP",
    experiment_id="historical",
    source_id="HadGEM3-GC31-MM",
    variable_id=["rlds"],
    table_id="day"
)

cat_daily = col.search(**query)
cat_daily.df

All other variables I am interested in (including surface downward shortwave radiation) are available with at least daily resolution. Therefore, I am wondering what are the criteria for datasets to be included in the Pangeo Catalog?

rabernat · January 19, 2022, 1:04pm

The complete ESGF CMIP6 catalog is ~ 20 PB. We have about 1 PB of data in Zarr format.

The data were populated based on a user-request form that is no longer supported. The process of ingesting data into the cloud is manual and relied on the heroic efforts of a scientists who has since retired.

We are trying to transition to a more sustainable system for continuing to expand the cloud data. It’s a big job because of the size and complexity of CMIP6.

You can get more details here - Pangeo / ESGF Cloud Data Working Group — Pangeo / ESGF Cloud Data Working Group documentation

If you’re interested in joining the working group to help find a solution, you are welcome!

christophrenkl · January 19, 2022, 4:45pm

Thanks for the explanation, @rabernat! I figured that it had something to do with storage capacity. I guess in my case it’s easiest just pull the missing dataset straight from ESGF.

Topic		Replies	Views
CMIP6 catalogue data and missing variables Data	1	89	October 10, 2024
Pressure level for the Pangeo CMIP6 catalog? Data	3	748	February 3, 2022
Loading CMIP5 data in python?	2	1008	May 21, 2024
Finding xCO2 variable for CMIP6 experiments Data	3	309	January 8, 2024
Access to Pangeo GCS Bucket to push model output from pre-CMIP6 experiments? Cloud	6	1138	November 21, 2019

Pangeo CMIP6 Catalog

Related topics