Pangeo Showcase: "High-performance Python STAC tooling, backed by Rust" (Feb 5, 2025)

maxrjones · January 31, 2025, 12:46am

Title: “High-performance Python STAC tooling, backed by Rust”
Invited Speaker: Pete Gadomski (ORCID: 0000-0003-4877-7217)
When: Wednesday, February 05, 2025 at 12 PM EST
Where: Launch Meeting - Zoom
Abstract:

The SpatioTemporal Asset Catalog (STAC) specification is an open, community-developed specification that enables large-scale, distributed search and discovery of geospatial assets. Part of the success of STAC has been due to its community-built tooling, written mostly in Python and Javascript, that was developed in tandem with the specification itself. As the specification and its usages have matured, we’ve seen the need to improve the software tooling ecosystem both through direct feature work on the existing libraries and by creating new libraries to cover new use-cases. In this talk, I’ll walk through the existing Python STAC ecosystem and showcase new developments, including stac-geoparquet innovations, STAC API queries using DuckDB, and cloud-storage-agnostic access for STAC and its assets. Much of this new tooling is written in Rust and exposed with Python bindings, so I’ll talk a bit about how that works, the benefits, and the drawbacks. Finally, I’ll make some not-so-bold predictions on where I think the STAC ecosystem might be headed in the next few years, and talk a bit about the relationship between STAC and other open specifications that are heavily used in the scientific geospatial community, specifically Zarr.

Agenda:

~15 minutes - Showcase presentation
10 - 30 minutes - Discussion
15 - 30 minutes - Community check-in

thwllms · February 3, 2025, 6:51pm

@maxrjones will this be recorded? Would love to join but I’ve got a conflict.

maxrjones · February 3, 2025, 7:21pm

Yes, it will be recorded and uploaded to YouTube within a week.

maxrjones · February 6, 2025, 3:47pm

@thwllms the link to the youtube recording is available now at the top of the thread.

Michael_Sumner · February 8, 2025, 2:19am

thanks! excellent and very interesting

sotosoul · February 11, 2025, 12:08pm

Awesome presentation!

I’m particularly interested in the STAC-FastAPI-GeoParquet stack, hoping that it will eliminate the need for maintaing (and paying for…) a database. I’m curious how well it will scale (I’m thinking spatially querying a collection of (tens of) billions of Items – postgres does it well enough), and how it will work with appending or deleting Items.

When spatially querying one of our APIs that uses Geoparquet as its storage backend, the random geometry is intersected against a known, predefined grid that has been loaded into memory as a geodataframe, the relevant grid tile ids are then used to query the geoparquet system and another intersects operation is done, this time on the actual geoparquet dataset.

This approach requires that every operation respects the grid, which can be a good thing and a bad thing.

TomAugspurger · February 11, 2025, 12:29pm

I’m curious how well it will scale (I’m thinking spatially querying a collection of (tens of) billions of Items – postgres does it well enough)

It might take a bit of work on both the writing side (to organize the data well) and client side (to make sure the query exploits the data’s organization), but it should scale well to large datasets, and many concurrent readers.

appending or deleting Items.

You’ll probably want a table format like delta or iceberg, which build on top of parquet.

sotosoul · February 13, 2025, 12:59pm

Regarding the status and the structure of the stac-fastapi-geoparquet project, does it involve developing a new CRUD CoreClient (plus relevant TransactionsClient and Extensions) that plug into the existing stac-fastapi implementation similar to how pgstac does (stac-fastapi-pgstac)?

Is there an (un)official repository yet?

gadomski · February 13, 2025, 2:54pm

Yup!

Is there an (un)official repository yet?

Not yet but I’ll be making it public within a month or so (working it in the background right now) — I’ll have an accompanying blog post comparing its performance w/ other backends, etc.

Topic		Replies	Views
Creating searchable STAC catalog from COGs in S3 Data	10	2360	December 14, 2023
STAC and Earth Systems datasets Data	23	4958	October 24, 2022
Proposal: Expanding the xstac python tool to automate a few more of the hard parts Meta zarr	2	157	March 27, 2025
Tables, (x)arrays, and rasters¶	18	2922	November 15, 2022
Parallel creation of STAC catalogs Data	4	471	March 14, 2024

Pangeo Showcase: "High-performance Python STAC tooling, backed by Rust" (Feb 5, 2025)

Related topics