Pystac client query to get cloud masked data

I hope someone can give me a quick pointer here. I’m using the pystac_client to query the Element 84 AWS stac API (and still finding my way around the API). Okay, fine. I see catalog “sentinel-s2-l2a-cogs” from which I’ve pulled an example tile, and I see catalog “sentinel-s2-l1c”, where I understand the cloud mask data lives. Do I need to pull data from both, in two operations, and merge to then filter the L2 product pixels on the cloud mask? Or is there a convenient way to do a query (one data “pull”) on the L2 product as I have been doing but specify the cloud mask product as a filter to only get the non-cloudy pixels? If that makes sense.

Apologies for the haste of this post, I’m happy with an “RTFM” answer with a link to the FM, but I have a 6 month child howling and wanting a feed and could do with a pointer to get me starting first thing in the morning back at work!

1 Like


Okay, so without a fractious child on my lap, I believe I’ve found the cloud mask I’d want is actually in the SCL asset. From an example I was playing with, I’ve been using stac_load with a stac_cfg of

cfg = """---
      data_type: uint16
      nodata: 0
      unit: '1'
      data_type: uint8
      nodata: 0
      unit: '1'
      data_type: uint8
      nodata: 0
      unit: '1'
  aliases:  # Alias -> Canonical Name
    red: B04
    green: B03
    blue: B02
  warnings: ignore # Disable warnings about duplicate common names
cfg = yaml.load(cfg, Loader=yaml.SafeLoader)

This gives me a red, green, and blue variable in my xarray Dataset.

The docs say, for stac_cfg

Mostly used to specify “missing” metadata like pixel data types.

and I’ve not found a very comprehensive documentation for creating this config.


Saving the best til last, I just added the SCL asset to the list of bands to extract, thus

xx = stac_load(
    bands=("red", "green", "blue", "SCL"),
    resolution=10 * zoom,
    chunks={},  # <-- use Dask

and now have an extra variable, SCL, in my Dataset. Yay. And first glance suggests it contains features that look rather like clouds. So this is making progress.

Perhaps my question now boils down to ('cos I’m going to bed now):
Is the numeric encoding of SCL contained in a convenient attribute in the data somewhere, or do I need to refer back to the Sentinel docs to know what value corresponds to cloud free, cirrus etc?