I’m working with @cgentemann on a project and we would like to write some converted files to
s3://pangeo-data-upload-oregon. However, we’re not sure how to get write access to this repository. Is there a process to request to this bucket (or a different bucket? I was recently introduced to pangeo-forge so this may also be an option?)
Any insights would be very welcome.
I’m not sure about that specific bucket. There are some general questions to answer first:
- Do you just need a place to store temporary data? If so, consider using the Pangeo Scratch Bucket approach.
- Do you need to make your data public?
- How much data?
- Does you project have its own budget? (If the answer is yes, you should just create your own bucket.)
Thanks for asking about this @lsterzinger , see rabernat’s questions as to whether or not your dataset might be a good candidate for GitHub - pangeo-forge/pangeo-forge-recipes: Python library for building Pangeo Forge recipes.
s3://pangeo-data-upload-oregon is a relic from the past that we used to test some I/O from S3 when setting up the hub on AWS. I think you can read from it when on the hub, but you won’t be able to write to it.
As described here Pangeo Cloud — Pangeo documentation, the recommended approach is to manage your own S3 storage and connect to it from one of the pangeo hubs.
The only bucket you have write-access to by default is
s3://pangeo-scatch. So from a hub terminal you could do
aws s3 sync mydata s3://pangeo-scratch/lsterzinger/mydata. NOTE that location automatically deletes after 24 hours, so it is just good for testing. People have also used https://rclone.org with success to move data from places like google drive to s3://pangeo-scratch for testing.
to create your own S3 bucket, you have a couple options depending on timeframe:
fast track. setup a free tier account
apply for research credits with a couple month turn-around
Thanks for the info. Turns out we have our own AWS buckets available. I was able to connect to a bucket and upload data from my local machine, but when I generate a second IAM key pair and add it to Pangeo with
aws configure, I am not able to do the same.
I’m not even able to list the contents of a public bucket, I get the error:
An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
Is there something specific to the AWS CLI configuration on Pangeo that I should know about? It’s weird that I’m able to do this on one machine and not the other.
edit: I created a new key pair and assigned it to both my local machine and my Pangeo instance (both using
aws configure). Once again, listing/uploading files works on my local machine but not on Pangeo
Are you passing your credentials to s3fs? Or are you expecting s3fs to automatically detect them?
Try explicitly passing the credentials, i.e.
fs = s3fs.S3FileSystem(key=key, secret=secret)
I think that may be the problem. I was attempting to use a terminal environment and use the AWS CLI to upload a test file (
aws s3 cp <file.txt> s3://<bucket-name>/). Is that not supported in Pangeo? I understand the containerized nature of Pangeo instances makes things different than running on persistent machines.
I’ll try using s3fs with my credentials instead and see if that works
Also, if it’s a public bucket you need to explicitly use anonymous access, otherwise the hub credentials kick in. On a personal laptop you may be using AWS credentials with broader permissions (for example an AWS IAM user with admin access). On the hub, here are two example to access a public bucket:
aws --no-sign-request s3 ls s3://sentinel-s1-rtc-indigo/ or with s3fs don’t forget
fs = s3fs.S3FileSystem(anon=True)
Okay, interesting. So the bucket I’m using is public access, but only for reading. I’m trying to write to it, is that not possible in Pangeo then? I’m an IAM admin for that bucket. Would write access then only be available through
s3fs with specifying
Yes. my comment above is for reading data (it’s unusual to have a bucket with public/anonymous write permissions). The permissions on buckets are non-public by default, and there are tons of configuration options for limiting access to read/write/list content etc., unfortunately there is no standard configuration. But hopefully just passing your keys to s3fs works!
I have the bucket with public read but private write (of course) limited to those with IAM access. I should be able to do anything I need to via
fsspec. Thanks for the help!