Creating searchable STAC catalog from COGs in S3

Hi everyone,

I’m trying to create a STAC catalog to facilitate searching some COGs in S3.
The data in question are 30 m Copernicus DEM GeoTIFs. I’ve been following the tutorial here. Basically, I want to be able to query this collection of data and load just the area I want. The rub is that it’s just a collection of GeoTIFFs, not a STAC catalog. Okay, I’ll create one that references them, I thought…

I saved the catalog using

catalog.normalize_and_save(os.path.join('/home/guy/data/Agreed/Copernicus/DEM/', 'stac'), catalog_type=pystac.CatalogType.ABSOLUTE_PUBLISHED)

(I also tried RELATIVE_PUBLISHED - I must admit I’m not really sure on the meanings of these). This gave me a directory tree full of .json files with a ‘catalog.json’ in the ‘stac’ root. I load the catalog with

catalog = Client.open('/home/guy/data/Agreed/Copernicus/DEM/stac/catalog.json')

and it seems to work (does something). If I get the collections in this catalog:

for c in catalog.get_all_collections():
    print(c)

I get:

<CollectionClient id=dem-30m>

which makes sense, because I put one collection in that catalog with that id.

Using code I’ve used before to run a query using an image bounds:

query = catalog.search(
    collections=["dem-30m"],
    limit=100,
    bbox=bbox
)

I get:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In [82], line 1
----> 1 query = catalog.search(
      2     collections=["dem-30m"],
      3     limit=100,
      4     bbox=bbox
      5 )

File ~/anaconda3/envs/geocube/lib/python3.10/site-packages/pystac_client/client.py:422, in Client.search(self, method, max_items, limit, ids, collections, bbox, intersects, datetime, query, filter, filter_lang, sortby, fields)
    321 """Query the ``/search`` endpoint using the given parameters.
    322 
    323 This method returns an :class:`~pystac_client.ItemSearch` instance. See that
   (...)
    419         a ``"rel"`` type of ``"search"``.
    420 """
    421 if not self._conforms_to(ConformanceClasses.ITEM_SEARCH):
--> 422     raise NotImplementedError(
    423         "This catalog does not support search because it "
    424         f'does not conform to "{ConformanceClasses.ITEM_SEARCH}"'
    425     )
    426 search_link = self.get_search_link()
    427 if search_link:

NotImplementedError: This catalog does not support search because it does not conform to "ConformanceClasses.ITEM_SEARCH"

(Note, I get the same error if I deliberately mistype the collection id, whereas I was expecting more of a lookup/key error.

Could anyone give me some pointers? I want to create a catalog.json that references these GeoTIFFs and allows me to query and then grab just the area of interest. In my local directory, I have a tree that refers to all the various files and the json in each does seem to contain metadata such as spatial extent etc, so in theory the information should be there that supports a spatial query…

2 Likes

I might be wrong, but as far as I remember static catalogs cannot be searched using pystac_client (though there might be a different project that helps you do that).

I’m certainly no expert, but what I eventually ended up doing for experimentation purposes (deployment is a different issue, but I don’t have any experience with that) is use stac-fastapi’s docker-compose file to spin up a suitable PostgreSQL database and a STAC endpoint, and use pypgstac’s command line tool to ingest the static catalog.

I might be wrong, but as far as I remember static catalogs cannot be searched using pystac_client (though there might be a different project that helps you do that).

That’s correct.

If you can load up your STAC items into geopandas,
stac-geopandas.ipynb | notebooksharing.space has some methods that mimics the spatio-temporal queries using geopandas APIs. Depending on the size and how often the data change, you might want to save your STAC items as a single ItemCollection.

2 Likes

@Guy_Maskall depending on your goal, we may have sort of done this all ready for you.

That is, we’ve derived a HAND dataset from the AWS Open Data copy of the Copernicus GLO-30 DEM, and put a STAC catalog together for it, and we included the link to the source DEM tiles as a related link item. E.g.:
https://stac.asf.alaska.edu/collections/glo-30-hand/items/Copernicus_DSM_COG_10_S90_00_W180_00_HAND

will have:

{
  "id": "Copernicus_DSM_COG_10_S90_00_W180_00_HAND",
  "bbox": [...],
  "type": "Feature",
  "links": [
    ...
    {
      "rel": "related",
      "href": "https://copernicus-dem-30m.s3.eu-central-1.amazonaws.com/Copernicus_DSM_COG_10_S90_00_W180_00_DEM/Copernicus_DSM_COG_10_S90_00_W180_00_DEM.tif",
      "type": "image/tiff; application=geotiff",
      "title": "GLO-30 Public Copernicus Digital Elevation Model GeoTIFF used as input to create this HAND GeoTIFF"
    }
  ],
  "assets": {
    "data": {...}
  },
  "geometry": {...},
  "collection": "glo-30-hand",
  "properties": {...},
  "stac_version": "1.0.0",
  "stac_extensions": []
}

The HAND tiles are all pixel-aligned to their corresponding DEM tiles, and have the exact same coverage, so geo-seaches should work the same – you’d just grab the related link instead of the assets.

The collection is here:
https://stac.asf.alaska.edu/collections/glo-30-hand
And you can view it via the STAC browser here:
https://radiantearth.github.io/stac-browser/#/external/stac.asf.alaska.edu/collections/glo-30-hand

All the code we used to make the HAND collection is here:

And we might be interested in also just providing a GLO-30 DEM collection as well if that’s helpful.

Oh, I missed that this was for Copernicus DEM. If it’s helpful, we host that at Planetary Computer. @Guy_Maskall if you really want to use the COGs from AWS (not not the ones from Azure) you could in theory query against the Planetary Computer’s STAC API to get STAC items matching your query, and then rewrite the URLs to point to the ones you want to access from AWS.

@jhkennedy in case you decide to catalog GLO-30, the stactools package we used is at GitHub - stactools-packages/cop-dem: stactools package for working with Copernicus DEM data.

2 Likes

Thanks for the replies, everyone. I love this community.

So, in a hand-wavy way, whilst you can create a STAC catalog in the form of a catalog.json file, this is a static catalog and doesn’t support such spatial searching. To do that, you’d need to implement something that traversed the catalog to find the data of interest. But if the catalog is behind a hosted API somewhere, the backend on the host does this. Is that roughly correct? If so, what are examples of the use case for such static catalogs? My primary use case and interest is the ability to load data from areas of interest from COGs. I admit I still need to properly read the documentation on STAC.

Bonus beer for anyone who points me at hosted DTM COGs build upon Copernicus data. :wink:

Correct

That’s right. In the Planetary Computer’s case, we use stac-fastapi for the webserver and pgstac / PostgreSQL / PostGIS for the database backend. But you can swap in whatever webserver or database you want. The important thing is that the user submits a query (following the STAC API specification) and the backend does the actual work to find the matching items.

2 Likes

Was there a reason you’re opting to create a STAC catalogue here, as opposed to using a VRT?

1 Like

no one will take you seriously without a stac api :smiley:

I think there’s a need for some review here, especially when a stac api collection could be stored statically and updated regularly in a simple table … we don’t need new standards for really basic stuff, but maybe I’m missing something

@Guy_Maskall Apologies, I hadn’t seen this thread previously but I just stumbled onto it. There is an under development STAC API which contains the Copernicus GLO-30 DEM data for AWS. You can read more about the efforts here The Evolution of ASDI's Data Infrastructure — Development Seed and see an example notebook here asdi-examples/cop-dem/cop-dem-glo-30-stac-access.ipynb at main · developmentseed/asdi-examples · GitHub

I’m managing static STAC Catalogs on my university’s HPC Cluster (around 18 TB data volume currently), which are used (i.e., queried) by some Master’s students and myself quite successfully without the need for a database, API or whatever. We basically just query with simple Python functions, first by a bounding box to filter the STAC Collections and then by a time range to retrieve the matching STAC Items. The Items are then loaded as Xarray DataArrays/Datasets using odc-stac. The query functions can be found here.

I’m late to the party here, but I thought it might be relevant.

1 Like