The datasets that motivated this are now cataloged on our site. A few links:
- STAC Collection for Hawaii Daymet at daily frequency: https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-daily-hi
- The HTML summary, generated from that STAC collection: Planetary Computer
- We did the same for TerraClimate (STAC, HTML).
A few things to note:
- This models the datasets as Collections. My primary use case was generating STAC objects that could be cataloged at the same level as Landast, Sentinel, etc. at Planetary Computer
- The STAC collections were generated using xstac which needs to be moved out of my personal GitHub, but should be somewhat functional
- The STAC collection includes (duplicates) much of the data available in the Zarr store like the dimensions, coordinates, shapes, chunks. This is all used to build the HTML summary
- We noticed a need for communicating fsspec / xarray-specific things, which we’ve written up as a small STAC extension call xarray-assets. The usage from a typical pangeo stack would be something like
>>> import fsspec, xarray, pystac
>>> collection = pystac.read_file("examples/collection.json")
>>> asset = collection.assets["example"]
>>> asset.media_type # use this to choose `xr.open_zarr` vs. `xr.open_dataset`
'application/vnd+zarr'
>>> store = fsspec.get_mapper(asset.href, **asset.properties["xarray:storage_options"])
>>> ds = xarray.open_zarr(store, **asset.properties["xarray:open_kwargs"])
>>> ds
(or people would just use intake-stac
, which would use this pattern internally). The main point is you can go from STAC → DataArray without having to know anything other than the URL to the Collection and the name of the asset.