Google storage gs:// URLs for Pangeo datasets on GCS

andrewbrettin · October 25, 2020, 7:27pm

Hi Pangeo team,

I have been following the tutorial for rechunker and am trying to rechunk data onto my personal google cloud bucket. However, I would like to use the GFDL CM2.6 data here instead of the Copernicus Marine Environment which is used in the example. The tutorial gives a URL for this dataset (‘gs://pangeo-cmems-duacs’), but I don’t know where this link comes from, and I don’t know how to get the corresponding GCS URL for any of the other datasets I might be interested in.

Where can I find the Google Storage URL for other Pangeo datasets that I may be interested in (in particular the GFDL CM2.6 ocean surface datasets)?

Thanks,

Andrew

rabernat · October 26, 2020, 2:42pm

Hi @andrewbrettin – thanks for this interesting question.

The current “official” Pangeo catalog is an Intake catalog and is managed here:

And the catalog for CM2.6 is here:

github.com

pangeo-data/pangeo-datastore/blob/master/intake-catalogs/ocean/GFDL_CM2.6.yaml

plugins:
  source:
    - module: intake_xarray

sources:

  GFDL_CM2_6_control_ocean:
    description: "GFDL CM2.6 climate model control run monthly ocean fields"
    metadata:
      url: 'https://www.gfdl.noaa.gov/cm2-6/'
      tags:
        - ocean
        - model
    driver: zarr
    args:
      urlpath: gs://cmip6/GFDL_CM2_6/control/ocean
      consolidated: True
      storage_options:
        requester_pays: True

This file has been truncated. show original

This is turned into a website here:

We intend the data to be used via intake, e.g.


from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/ocean/GFDL_CM2.6.yaml")
ds  = cat["GFDL_CM2_6_control_ocean"].to_dask()

However, your question reveals two problems with this approach:

If you don’t want to open the data with xarray / dask but would rather open it directly with zarr, or just even know the actual URL on cloud storage, intake doesn’t make that easy for you
The catalog website also does not make that information obvious

These are two concrete things we could try to improve going forward.

I hope this helps.

Topic		Replies	Views
Delete access to Google Cloud Storage object Pangeo Cloud Support	14	1637	February 22, 2022
Access to some Pangeo GCS Bucket to push data from CNES Cloud	4	700	September 29, 2019
Pangeo CMIP6 Catalog Data	2	981	January 19, 2022
Questions about Pangeo Forge data in "swot adac ogcms" Data	2	353	September 21, 2022
Opening cloud data without using intake Pangeo Cloud Support	2	654	July 17, 2020

Google storage gs:// URLs for Pangeo datasets on GCS

Related topics