Hello all,
I’m finally getting back to some more climate model data analysis on the cloud! And, I was hoping to get some insight into what, if anything, is happening with the Zarr-format CMIP6 data under the s3://cmip6-pds bucket. I’ve been exploring transitioning some analysis to use that data, and it had been working fine for a couple of weeks. However, just this week, queries against that bucket have been lagged and files that were previously there seem to be missing. Either that, or the response is timing out before getting anything back and I get “.zmetadata” key errors (not found).
Simple AWS cli commands like:
aws s3 ls s3://cmip6-pds
sometimes return…but when they do, it often takes several minutes. Trying to list any subkeys also is very hit-or-miss.
I browsed through the Pangeo discourse board and didn’t see anything specifically about any work being done on that bucket. I emailed the AWS Sustainability Data Initiative Team, since it seemed like this behavior was more of an S3 issue. Their response was that it looked like someone was doing a massive reorganization of data in that bucket, and I may be “running into that”.
Upon closer inspection, it does look like the data is being moved around to include a new “version” subkey in the dataset paths to be more consistent with how ESGF stores the data. I was curious if anyone on the Pangeo team could provide any insight or description of what was going on with that bucket? For the moment, the bucket seems to be unusable for me, since I can’t get any stable queries against files in it.
This is an amazing resource that this group is providing, and it’s emblematic of everything I enjoy about the Pangeo community. I really hope to be able to re-engage with the community in the coming months as I get back into things again!
Thanks!
Luke Madaus