Scratch Bucket is not working on new Pangeo Cloud cluster

I’m no longer able to write to the scratch bucket.

import os
import fsspec

print(os.environ['PANGEO_SCRATCH'])
# -> gcs://pangeo-integration-te-3eea-prod-scratch-bucket/rabernat

with fsspec.open(os.environ['PANGEO_SCRATCH'] + '/test-path.txt') as fp:
    fp.write('Hello world')
# -> FileNotFoundError: b/pangeo-integration-te-3eea-prod-scratch-bucket/o/

My impression is that the bucket either does not exist or is not configured with permissions that allow us to see it. I also tried the same code from staging and found the same result.

You way want to suggest that gcsfs provides more (optional) logging and/or that FileNotFound errors include their __cause__ for better understanding (at the cost of longer tracebacks).

The existing traceback should tell you whether the error occurs while looking up the file info or while reading data - the two operations need different permissions.

I received the following update from Sarah about this issue:

It turns out there’s a mismatch in our terraform and JupyterHub configs - so while a bucket did exist, it didn’t have the name that JupyterHub was expecting it to have, so they missed one another. I’ve rectified this by manually creating the bucket with the expected naming convention and additionally added the Storage Object Creator role to the Service Account so the bucket is writable. The engineering team will schedule some development time to fix our config mismatch sometime soon (this will effect all hubs/clusters, not just Pangeo).

@paigem - maybe you can check whether things are working for you now?

Yes, I can now save to the Scratch bucket! Thanks @rabernat et al!

Related question (apologies if this is a very basic one): is there a way to view which items are currently in my scratch folder? E.g. something like a unix ‘ls’ command. I use something like gcs.ls('path/to/my/bucket') to quickly see what’s in my own storage bucket, but this does not seem to work for the scratch bucket path (or perhaps I’m missing something?). It would be great to know if there is something similar for the scratch bucket.

gcs.ls(schratch_bucket) is exactly what you need. However, gcsfs caches listings, so if you have been writing to the bucket from other processes, the listing can be out of date - call gcs.invalidate_cache() to clear.

1 Like

Thanks @martindurant!