Accessing the new Element84 S2 (L1C) stac catalogue

Hey folks, I was excited to see the new Element84 Sentinel-2 STAC catalogue which now includes L1C data: https://www.element84.com/earth-search/. I was following this guide on loading data from the previous (v0) catalogue which works great, but when I upgrade this to try and work with the new v1 catalogue I’m getting Access Denied issues on compute. I’ve tried various AWS configs (configure_rio) but without joy so wondering if anyone else has taken a look yet!?

This is likely because all of the asset URLs are incorrect for the L1C Items – the bucket was accidentally set to the sentinel-s2-l2a bucket instead of the sentinel-s2-l1c bucket. For example, this asset href on the first result I got: “s3://sentinel-s2-l2a/tiles/33/X/WB/2023/6/8/0/B02.jp2” should be “s3://sentinel-s2-l1c/tiles/33/X/WB/2023/6/8/0/B02.jp2”. The workaround is to rewrite the URL to fix the bucket name if you can. We’re working on fixing this, but it’s likely several weeks out at least.

1 Like

Here’s an issue that is likely part of the OP’s problem: Collection sentinel-2-l2c items have asset hrefs that reference sentinel-s2-l2a bucket · Issue #3 · Element84/earth-search · GitHub

(note: @philvarner pointed me to this issue, so I just thought I’d link it in to this conversation as well)

Ah gotcha, yep that looks to be the likely culprit! Thanks for that. Interesting that the earthsearch:s3_path is still listed as l1c though?

[EDIT] I’m not in much of a rush on this so I could just wait for the change and maybe play with L2A instead for now, but I did try and access with a simple replace string approach but still getting access denied:

import rasterio
with rasterio.open(S2_items[0][‘assets’][‘red’][‘href’].replace(“l2a”, “l1c” )) as dataset:
rasterio.plot.show(dataset)

Hi @akpetty! Here’s some code that might help. Some of my team members were working on L1C data recently (you should know Lilly :wink:) and was facing the same issue, so you’re in luck!

First, import some libraries

import os

import pystac_client
import rioxarray
import stackstac

Next, set up the STAC query and patch the href URLs of the STAC assets. This is an example using stackstac:

client = pystac_client.Client.open(url="https://earth-search.aws.element84.com/v1/")
search = client.search(
    collections="sentinel-2-l1c",
    bbox=[-20.7, 64.5, -19.5, 64.8],  # xmin, ymin, xmax, ymax
    datetime="2023-02-01/2023-02-28",
)
stac_items = search.items()
stac_item = next(stac_items)  # <Item id=S2B_27WWM_20230228_0_L1C>

for stac_asset in stac_item.assets.values():
    stac_asset.href = stac_asset.href.replace(
        "s3://sentinel-s2-l2a/", "s3://sentinel-s2-l1c/"
    )

At this point, you should be able to read the metadata (even without authentication).

dataarray = stackstac.stack(items=stac_item, dtype="float16", resolution=10)
print(dataarray)

produces

<xarray.DataArray 'stackstac-40b6166a40b446f90241b2283a6022d7' (time: 1,
                                                                band: 14,
                                                                y: 10980,
                                                                x: 10980)>
dask.array<fetch_raster_window, shape=(1, 14, 10980, 10980), dtype=float16, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/39)
  * time                              (time) datetime64[ns] 2023-02-28T13:03:...
    id                                (time) <U24 'S2B_27WWM_20230228_0_L1C'
  * band                              (band) <U8 'blue' 'cirrus' ... 'visual'
  * x                                 (x) float64 5e+05 5e+05 ... 6.098e+05
  * y                                 (y) float64 7.2e+06 7.2e+06 ... 7.09e+06
    processing:software               object {'sentinel2-to-stac': '0.1.0'}
    ...                                ...
    raster:bands                      (band) object [{'nodata': 0, 'data_type...
    gsd                               (band) object 10 60 60 10 ... 20 20 None
    common_name                       (band) object 'blue' 'cirrus' ... None
    center_wavelength                 (band) object 0.49 1.3735 ... 2.19 None
    full_width_half_max               (band) object 0.098 0.075 ... 0.242 None
    epsg                              int64 32627
Attributes:
    spec:        RasterSpec(epsg=32627, bounds=(499980, 7090200, 609780, 7200...
    crs:         epsg:32627
    transform:   | 10.00, 0.00, 499980.00|\n| 0.00,-10.00, 7200000.00|\n| 0.0...
    resolution:  10

Now this is where it gets tricky, you’ll need to set up the AWS requester pays somehow. There’s probably a couple of ways, but one way I have it setup is to edit the ~/.aws/credentials file, and have three lines like this:

[default]
aws_access_key_id = ABCDEFGHIJKLMNOPQRST
aws_secret_access_key = MnOpQrStUvWxYz1a2B3c4D5e6f7G8h9IjKlMnOpQ

Now you can set some environment variables and plot the Sentinel L1C data:

os.environ["AWS_REQUEST_PAYER"] = "requester"
os.environ["AWS_PROFILE"] = "default"
da_rgb = dataarray.sel(band=["red", "green", "blue"]).squeeze()[:, :100, :100]  # get subset
da_rgb.astype("int").plot.imshow(rgb="band", robust=True)

produces

image

Notes:

  • I had this running on the CryoCloud Hub at AWS us-west-2, might need to pip install stackstac first if running there too.
  • According to https://element84.com/blog/introducing-earth-search-v1-new-datasets-now-available, the sentinel-s2-l1c collection is now named sentinel-2-l1c, but I couldn’t get the new one to work for some reason, getting RuntimeError: Error opening 's3://sentinel-2-l1c/tiles/27/W/WM/2023/2/28/0/B04.jp2': RasterioIOError("'/vsis3/sentinel-2-l1c/tiles/27/W/WM/2023/2/28/0/B04.jp2' does not exist in the file system, and is not recognized as a supported dataset name."). Might be some delay in the renaming?

Thanks @weiji14, I had a feeling I should have reached out to you directly! I copied exactly your example and am also running on CryoCloud but unfortunately I am still getting Access Denied issues, so maybe there’s something else going on my end. It seems like my credentials are being used now at the very least (when I tried a fake ID I got a different error!) so I’ll keep exploring…

Yeah, we should catch up sometime :smiley: Anyways, on the Access Denied part, I think the CryoCloud Hub has set up requester pays for the USGS Landsat S3 bucket (see Requester pays fix needed · Issue #52 · CryoInTheCloud/hub-image · GitHub), but you may need a different access key (i.e. your personal or institutional one) for this particular Element84 bucket. The documentation around this isn’t particularly good, I found Downloading objects in Requester Pays buckets - Amazon Simple Storage Service which might be helpful if you want to test things out on the CLI first though if you want to make sure that your credentials work.

OK finally got it working hurrah, had to create and use a new personal access key instead of the access key created on my institutional account, so maybe our NASA admins have blocked requester_pay buckets (or I messed up something, I’ve pinged them an email). Anyway thanks so much for your help @weiji14, couldn’t have got there without that input. Excited to show you what I’m working on once I’ve polished it up.

Definitely agree people have created some amazing resources out there to explain the stac side of all this, but that final AWS link is pretty unclear.

1 Like