As someone who’s primarily working with Python and SQL, to manipulate climate data using xarray,I wanted to know how an entry-level climate data engineer may leverage Rust, either immediately or in a couple of years
I’ve seen Rust mentioned a couple times in this space as “plug-ins” to speedup operations. I’d like to develop my skills towards contributing to the Pangeo packages and was wondering what sort of appetite exists for the application of Rust.
To boil it down, does Pangeo need Rust? If so, how? What does it offer? Any articles and opinions are welcome
2 Likes
You can ‘use’ rust by using Polars.
Building a rust backend to Xarray would be an interesting use case. There is a rust reader for grib.
1 Like
I wrote up some thoughts on Rust and geospatial here: gadom.ski • Rust and geospatial
Tl;dr: it’s great for improving “solved” problems (e.g. implementing specs) and making tools/apps. Might be less useful for data analysis, where you can continue to use things like polars and stay in Python. GitHub - developmentseed/obstore: Simple, fast integration with Amazon S3, Google Cloud Storage, Azure Storage, and S3-compliant APIs like Cloudflare R2 is another (newer) example of Python tooling that uses Rust to enhance performance.
5 Likes
Besides gribberish, there are several experiment Rust-based backend engines for xarray now for different file formats, e.g:
These libraries adds an xarray BackendEntrypoint so that you can do xarray.open_dataset(..., engine="name-of-rust-backend")
to open those datasets using a Rust-based engine, but into Python.
There are also some Rust-based I/O libraries/crates for Zarr stores that may be more relevant for climate science:
People are usually excited with Rust because it can (sometimes, not always) speed up performance via more efficient multi-threading/async operations, see e.g. Announcing Icechunk! | Earthmover where Icechunk is faster in reading Zarr v3 than with using Dask. This isn’t true for everything though, there are some core libraries written in low-level languages like C/Fortran that have Python/R bindings (e.g. GDAL, PROJ, etc) that wouldn’t get much of a speedup if you rewrite it in Rust (though there are other benefits like memory safety).
That said, for newer libraries (e.g. the Zarr v3 readers above), I’d like to think that it can be good to use Rust instead of C/C++, because it’s relatively easy to create bindings in Python/R/Julia/etc to a core Rust library. The GeoRust organization is doing a good job at implementing many core geospatial operations in Rust, mostly for vector points/lines/polygons, but I’m hoping that raster/n-dimensional datasets get their fair share of efficient Rust tooling once more people become familiar with the language.
5 Likes