Accessing Metadata as Variables in Xarray Datasets Using odc-stac?

Hi everyone,

I’m currently working with the odc-stac library to access and process geospatial data, particularly Sentinel-1 RTC datasets. I’ve been able to load the data successfully into an Xarray Dataset, but I’m interested in retrieving and incorporating metadata associated with the images—such as orbit information, instrument mode, acquisition date, and other relevant details—directly as variables within the Xarray Dataset.

So far, I haven’t found a clear way to achieve this using odc-stac. Ideally, I would like these metadata elements to be loaded alongside the data itself, making it easier to analyze and utilize them programmatically without having to manually extract or merge this information from separate sources.

My questions are:

  • Is there a built-in way in odc-stac to load these metadata fields as variables within the Xarray Dataset?
  • If not, what would be the best approach to access and integrate this kind of metadata? Are there recommended workarounds, such as modifying the STAC item properties or using additional libraries?
  • Any tips on handling metadata from Sentinel-1 RTC or similar datasets would be greatly appreciated!

I’ve gone through the documentation, but the information on metadata handling seems limited, so any pointers or examples would be super helpful.

Thanks in advance for your guidance!

There is another package that is similar to odc-stac, which is called stackstac. It should load the STAC Item properties without issues.
Here is an interesting discussion about the differences between both packages, with a great breakdown by @kirill.kzb (maintainer of odc-stac) in this comment. It also includes the following section about this topic:

  • Access to the original STAC metadata
    • odc-stac doesn’t really expose any of that, and there is a fundamental design choice that makes it impossible to do in a general case, but we can certainly add it for special case data loading in the future.
    • stackstac exposes all the metadata fields in the returned xarray, combined with delayed computation enabled by Dask this can be very handy as you can leverage all the xarray conveniences to filter out unwanted data.

Thank you very much for the response! The truth is that I have visited that discussion quite a few times and I have never noticed that point. It’s certainly a problem for me, because that associated metadata is important. I would like to know more about the decision about not exposing them by @kirill.kzb

@Fran_Martin You can add STAC metadata fields via ds.assign_coords, just take care with any grouping that your doing with odc.stac.load.

The basic approach is 1. create a dataframe from your STAC items, 2. match the time index of the dataframe with the xr.Dataset time index from odc.stac 3. optionally do groupby operations on the dataframe and 4. use ds.assign_coords to apply various metadata fields as coordinates

If you’re working with RTC check out this repository, in particular this function:

Here is another example with landsat CoCoLessons/10_rioxarray.ipynb at main · CodeToCommunicate/CoCoLessons · GitHub

3 Likes

Hello @scottyhq ,

First of all, I highly appreciate your explanation. I managed to get all the metadata I need with your how-to, following your exact steps, so if it helps anyone, It works :slight_smile:

2 Likes