CMIP 6 Google Cloud Storage Region

Hi!

Would someone be able to help me figure out which region the CMIP6 data are stored in? It’s not mentioned in the dataset description on Google Cloud: Google Cloud console. I also don’t have the right permissions to pull the region from the bucket.

(Apologies if this is already documented somewhere, I looked through the Pangeo examples, but didn’t see any mention of the bucket location).

Thanks!

1 Like

Hi @scharlottej13 I believe we are in US-CENTRAL1. I can double check this in the next days if that helps (I tried to get the region of the bucket via gsutil ls -Lb gs://cmip6 but run into a permission error). I do not think that this is documented somewhere, but we should definitely add that somewhere.

Thanks @jbusecke! If you (or someone else) is able to confirm, that would be great. Yeah, gsutil ls -Lb gs://cmip6 is what I tried as well and hit a permissions error.

I’d be happy to open a PR (perhaps to the readme in GitHub - pangeo-data/pangeo-datastore: Pangeo Cloud Datastore) to document this too if that’s helpful.

I also hit the same permission errors. @jbusecke were you able to double check?

1 Like

Still waiting for a reply…

1 Like

I’m able to run the gsutil command that is supposed to report a “location constraint”, but it doesn’t for me!

(base) rsignell@Elio:~$ gsutil ls -L -b gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r1i1p1f1/Omon/tos/gn/v20200918/.zgroup
gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r1i1p1f1/Omon/tos/gn/v20200918/.zgroup:
    Creation time:          Wed, 07 Apr 2021 20:29:51 GMT
    Update time:            Wed, 07 Apr 2021 20:29:51 GMT
    Storage class:          STANDARD
    Content-Language:       en
    Content-Length:         24
    Content-Type:           application/octet-stream
    Hash (crc32c):          REp3Bw==
    Hash (md5):             4gKXk15z3QFUEE1OpTBAqw==
    ETag:                   CPCF+9f87O8CEAE=
    Generation:             1617827391783664
    Metageneration:         1
    ACL:                    []
TOTAL: 1 objects, 24 bytes (24 B)

Oh, actually, I guess you need to run that command just on the bucket. So this works:

(base) rsignell@Elio:~$ gsutil ls -Lb gs://pangeo-era5
gs://pangeo-era5/ :
        Storage class:                  STANDARD
        Location type:                  region
        Location constraint:            US-CENTRAL1
        Versioning enabled:             None
        Logging configuration:          None
        Website configuration:          None
        CORS configuration:             None
        Lifecycle configuration:        None
        Requester Pays enabled:         None
        Labels:                         None
        Default KMS key:                None
        Time created:                   Wed, 09 Oct 2019 01:56:21 GMT
        Time updated:                   Wed, 09 Oct 2019 02:02:42 GMT
        Metageneration:                 3
        Bucket Policy Only enabled:     True
        Public access prevention:       inherited
        ACL:                            []
        Default ACL:                    []

and this works:

 gcloud storage buckets describe gs://pangeo-era5
creation_time: 2019-10-09T01:56:21+0000
default_storage_class: STANDARD
location: US-CENTRAL1
location_type: region
metageneration: 3
name: pangeo-era5
public_access_prevention: inherited
soft_delete_policy:
  effectiveTime: '2024-03-01T08:00:00+00:00'
  retentionDurationSeconds: '604800'
storage_url: gs://pangeo-era5/
uniform_bucket_level_access: true
update_time: 2019-10-09T02:02:42+0000

But I guess the top level bucket for cmip6 is not accessible to the public?

(base) rsignell@Elio:~$ gcloud storage buckets describe gs://cmip6
ERROR: (gcloud.storage.buckets.describe) User [rsignell@gmail.com] does not have permission to access b instance [cmip6] (or it may not exist): rsignell@gmail.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).

Yeah there is something about the public dataset program that does not enable you to find this out AFAICT…

@jbusecke, I tried running your 2023 AMS live_demo notebook in different GCP regions using Coiled (using 4 workers each with 4 threads: 16 threads). The first time I got:

  • us-west1: 250s
  • us-central1: 130s
  • us-east1: 89s

Then I tried running the notebooks again and got close to 90s for all three. Perhaps the public dataset program is caching data with Cloud CDN?