Large Scale Geospatial Benchmarks

Hi All,

We’re looking to build out a collection of large-scale, end-to-end geospatial benchmarks to ensure that tools like Xarray, Dask, etc. operate smoothly up to the 100-TB scale. @mrocklin and I wrote down a few characteristics we think make a good benchmark based on our previous experience using TPC-H benchmarks to improve Dask DataFrame.

If folks here have thoughts on geospatial benchmarks they think would be a good fit, we’d love to collaborate. Please leave a comment in the above ^ GitHub discussion.

10 Likes

Just wanted to say, I love the initiative.

I’ve been working on some pretty hefty time series workflows with geospatial data that definitely fall within what you’re targeting here, and I’d love to contribute to these.

1 Like