Pangeo Showcase Talk by Campbell Watson at IBM Research
Campbell and Johannes are from IBM Research. With a team across IBM’s global research labs, they’re building the IBM Geospatial Discovery Network. Campbell was trained in atmospheric science and Johannes was trained in physics and math. Today their research centers on big data technologies, machine learning and artificial intelligence for geospatial problems.
When confronted with a large amount of data, an immediate question is that of search. For traditional database systems this often means how to effectively retrieve a single entry such as a business transaction or a sensor measurement. In this case, queries are accelerated by building suitable indices that index individual entries. For geospatial data, the single entry—a pixel—is often not of that much interest. Rather, a user might want to find groups of pixels that define a storm, a heat wave, an atmospheric river, or a set of clouds obscuring a satellite image.
With this in mind, we introduce the concept of overview indices. These are regional indices that summarize statistics across groups of pixels. They go beyond tile-level metadata often found in STAC-like catalogues by allowing us to identify relevant subsets of the data without loading the pixel-level raw data.
Our talk will discuss the design of these indices—based on geoparquet—as well as their various uses. The latter will extend to questions of data federation and query optimization where minimizing (data transfer) costs is needed for cross-data center and cross-region operations.