STAC and Earth Systems datasets

The datasets that motivated this are now cataloged on our site. A few links:

  1. STAC Collection for Hawaii Daymet at daily frequency: https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-daily-hi
  2. The HTML summary, generated from that STAC collection: Planetary Computer
  3. We did the same for TerraClimate (STAC, HTML).

A few things to note:

  • This models the datasets as Collections. My primary use case was generating STAC objects that could be cataloged at the same level as Landast, Sentinel, etc. at Planetary Computer
  • The STAC collections were generated using xstac which needs to be moved out of my personal GitHub, but should be somewhat functional
  • The STAC collection includes (duplicates) much of the data available in the Zarr store like the dimensions, coordinates, shapes, chunks. This is all used to build the HTML summary
  • We noticed a need for communicating fsspec / xarray-specific things, which we’ve written up as a small STAC extension call xarray-assets. The usage from a typical pangeo stack would be something like
>>> import fsspec, xarray, pystac
>>> collection = pystac.read_file("examples/collection.json")
>>> asset = collection.assets["example"]
>>> asset.media_type  # use this to choose `xr.open_zarr` vs. `xr.open_dataset`
'application/vnd+zarr'
>>> store = fsspec.get_mapper(asset.href, **asset.properties["xarray:storage_options"])
>>> ds = xarray.open_zarr(store, **asset.properties["xarray:open_kwargs"])
>>> ds

(or people would just use intake-stac, which would use this pattern internally). The main point is you can go from STAC → DataArray without having to know anything other than the URL to the Collection and the name of the asset.

4 Likes