Hi Pangeo team,
I have been following the tutorial for rechunker and am trying to rechunk data onto my personal google cloud bucket. However, I would like to use the GFDL CM2.6 data here instead of the Copernicus Marine Environment which is used in the example. The tutorial gives a URL for this dataset (‘gs://pangeo-cmems-duacs’), but I don’t know where this link comes from, and I don’t know how to get the corresponding GCS URL for any of the other datasets I might be interested in.
Where can I find the Google Storage URL for other Pangeo datasets that I may be interested in (in particular the GFDL CM2.6 ocean surface datasets)?
Hi @andrewbrettin – thanks for this interesting question.
The current “official” Pangeo catalog is an Intake catalog and is managed here:
And the catalog for CM2.6 is here:
This is turned into a website here:
We intend the data to be used via intake, e.g.
from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/GFDL_CM2.6.yaml")
ds = cat["GFDL_CM2_6_control_ocean"].to_dask()
However, your question reveals two problems with this approach:
- If you don’t want to open the data with xarray / dask but would rather open it directly with zarr, or just even know the actual URL on cloud storage, intake doesn’t make that easy for you
- The catalog website also does not make that information obvious
These are two concrete things we could try to improve going forward.
I hope this helps.