Hi!
Would someone be able to help me figure out which region the CMIP6 data are stored in? It’s not mentioned in the dataset description on Google Cloud: Google Cloud console. I also don’t have the right permissions to pull the region from the bucket.
(Apologies if this is already documented somewhere, I looked through the Pangeo examples, but didn’t see any mention of the bucket location).
Thanks!
1 Like
Hi @scharlottej13 I believe we are in US-CENTRAL1. I can double check this in the next days if that helps (I tried to get the region of the bucket via gsutil ls -Lb gs://cmip6
but run into a permission error). I do not think that this is documented somewhere, but we should definitely add that somewhere.
Thanks @jbusecke! If you (or someone else) is able to confirm, that would be great. Yeah, gsutil ls -Lb gs://cmip6
is what I tried as well and hit a permissions error.
I’d be happy to open a PR (perhaps to the readme in GitHub - pangeo-data/pangeo-datastore: Pangeo Cloud Datastore) to document this too if that’s helpful.
I also hit the same permission errors. @jbusecke were you able to double check?
1 Like
Still waiting for a reply…
1 Like
I’m able to run the gsutil
command that is supposed to report a “location constraint”, but it doesn’t for me!
(base) rsignell@Elio:~$ gsutil ls -L -b gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r1i1p1f1/Omon/tos/gn/v20200918/.zgroup
gs://cmip6/CMIP6/ScenarioMIP/EC-Earth-Consortium/EC-Earth3/ssp245/r1i1p1f1/Omon/tos/gn/v20200918/.zgroup:
Creation time: Wed, 07 Apr 2021 20:29:51 GMT
Update time: Wed, 07 Apr 2021 20:29:51 GMT
Storage class: STANDARD
Content-Language: en
Content-Length: 24
Content-Type: application/octet-stream
Hash (crc32c): REp3Bw==
Hash (md5): 4gKXk15z3QFUEE1OpTBAqw==
ETag: CPCF+9f87O8CEAE=
Generation: 1617827391783664
Metageneration: 1
ACL: []
TOTAL: 1 objects, 24 bytes (24 B)
Oh, actually, I guess you need to run that command just on the bucket. So this works:
(base) rsignell@Elio:~$ gsutil ls -Lb gs://pangeo-era5
gs://pangeo-era5/ :
Storage class: STANDARD
Location type: region
Location constraint: US-CENTRAL1
Versioning enabled: None
Logging configuration: None
Website configuration: None
CORS configuration: None
Lifecycle configuration: None
Requester Pays enabled: None
Labels: None
Default KMS key: None
Time created: Wed, 09 Oct 2019 01:56:21 GMT
Time updated: Wed, 09 Oct 2019 02:02:42 GMT
Metageneration: 3
Bucket Policy Only enabled: True
Public access prevention: inherited
ACL: []
Default ACL: []
and this works:
gcloud storage buckets describe gs://pangeo-era5
creation_time: 2019-10-09T01:56:21+0000
default_storage_class: STANDARD
location: US-CENTRAL1
location_type: region
metageneration: 3
name: pangeo-era5
public_access_prevention: inherited
soft_delete_policy:
effectiveTime: '2024-03-01T08:00:00+00:00'
retentionDurationSeconds: '604800'
storage_url: gs://pangeo-era5/
uniform_bucket_level_access: true
update_time: 2019-10-09T02:02:42+0000
But I guess the top level bucket for cmip6 is not accessible to the public?
(base) rsignell@Elio:~$ gcloud storage buckets describe gs://cmip6
ERROR: (gcloud.storage.buckets.describe) User [rsignell@gmail.com] does not have permission to access b instance [cmip6] (or it may not exist): rsignell@gmail.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).
Yeah there is something about the public dataset program that does not enable you to find this out AFAICT…
@jbusecke, I tried running your 2023 AMS live_demo notebook in different GCP regions using Coiled (using 4 workers each with 4 threads: 16 threads). The first time I got:
us-west1
: 250s
us-central1
: 130s
us-east1
: 89s
Then I tried running the notebooks again and got close to 90s for all three. Perhaps the public dataset program is caching data with Cloud CDN?