Why learn rust? A pangeo perspective

As someone who’s primarily working with Python and SQL, to manipulate climate data using xarray,I wanted to know how an entry-level climate data engineer may leverage Rust, either immediately or in a couple of years

I’ve seen Rust mentioned a couple times in this space as “plug-ins” to speedup operations. I’d like to develop my skills towards contributing to the Pangeo packages and was wondering what sort of appetite exists for the application of Rust.

To boil it down, does Pangeo need Rust? If so, how? What does it offer? Any articles and opinions are welcome

2 Likes

You can ‘use’ rust by using Polars.

Building a rust backend to Xarray would be an interesting use case. There is a rust reader for grib.

1 Like

I wrote up some thoughts on Rust and geospatial here: gadom.ski • Rust and geospatial

Tl;dr: it’s great for improving “solved” problems (e.g. implementing specs) and making tools/apps. Might be less useful for data analysis, where you can continue to use things like polars and stay in Python. GitHub - developmentseed/obstore: Simple, fast integration with Amazon S3, Google Cloud Storage, Azure Storage, and S3-compliant APIs like Cloudflare R2 is another (newer) example of Python tooling that uses Rust to enhance performance.

5 Likes

Besides gribberish, there are several experiment Rust-based backend engines for xarray now for different file formats, e.g:

These libraries adds an xarray BackendEntrypoint so that you can do xarray.open_dataset(..., engine="name-of-rust-backend") to open those datasets using a Rust-based engine, but into Python.

There are also some Rust-based I/O libraries/crates for Zarr stores that may be more relevant for climate science:

People are usually excited with Rust because it can (sometimes, not always) speed up performance via more efficient multi-threading/async operations, see e.g. Announcing Icechunk! | Earthmover where Icechunk is faster in reading Zarr v3 than with using Dask. This isn’t true for everything though, there are some core libraries written in low-level languages like C/Fortran that have Python/R bindings (e.g. GDAL, PROJ, etc) that wouldn’t get much of a speedup if you rewrite it in Rust (though there are other benefits like memory safety).

That said, for newer libraries (e.g. the Zarr v3 readers above), I’d like to think that it can be good to use Rust instead of C/C++, because it’s relatively easy to create bindings in Python/R/Julia/etc to a core Rust library. The GeoRust organization is doing a good job at implementing many core geospatial operations in Rust, mostly for vector points/lines/polygons, but I’m hoping that raster/n-dimensional datasets get their fair share of efficient Rust tooling once more people become familiar with the language.

6 Likes

Do folks believe there would be any benefits to implementing any parts of xarray’s computational machinery in Rust?

The particular use-case I have in mind is processing large out-of-core chunked datasets (like numerical weather predictions) on a single machine, where we want to make full use of all CPU cores, but we also want to be as CPU-efficient and memory-efficient as possible, and to have minimal start-up latencies. And where we want to interleave IO with compute.

Perhaps this would already be possible by using xarray’s “flexible arrays” (based on Numpy’s “duck arrays”)? i.e. we wouldn’t need to “re-write” any of the core xarray code, instead we’d have new Rust code (with a Python API) that slots neatly into xarray’s backend APIs for flexible arrays and indexing?

Is there any point to this?! :slight_smile:

I’ve heard it said the xarray can be quite slow for some use-cases, but I haven’t done any rigorous benchmarking myself. I’ve also heard people say that the last thing the community needs is yet another framework for computation!

So I’m very keen to hear other peoples’ views!

FWIW, I asked Gemini “Are there any specific problems with xarray that could be solved by writing xarray backends in Rust?”. Gemini certainly “thinks” there’s useful work to be done; especially around indexing, alignment, chunked array operations, apply_ufunc, memory management, parallelizing internal loops, and interleaving IO with compute. (But, as we all know, it’s dangerous to fully believe what LLMs tell us!) Here’s Gemini’s 3-page response.

implementing any parts of xarray ’s computational machinery in Rust?

i.e. we wouldn’t need to “re-write” any of the core xarray code, instead we’d have new Rust code (with a Python API)

You could do that, but xarray’s flexible arrays means that you’re basically suggesting rewriting numpy in rust.

I’ve heard it said the xarray can be quite slow for some use-cases

Statements like this aren’t super informative because “xarray” is really xarray + a backend for understanding a format (e.g. h5netcdf/zarr-python) + a way to get at remote chunks (e.g. fsspec) + a DAG (e.g. dask/cubed) + an algorithm represented in the DAG (e.g. flox) + a DAG scheduler (dask.distributed) the actual low-level computation (e.g. numpy/cupy).

I’ve also heard people say that the last thing the community needs is yet another framework for computation !

Interesting - IMO there should be several options for computation to promote competition, rather than relying on dask all the way down.

In most cases (certainly in cloud workflows) the actual bottleneck in xarray workflows is IO, not compute, which is why the rust core of Icechunk can deliver big speedups.

writing xarray backends in Rust?

If you rewrote an xarray backend in rust the best you could do would likely be to approach the performance of using VirtualiZarr with icechunk, as Icechunk with virtual chunks is effectively a general rust-powered async backend for reading any filetype that can be virtualized. This is focused on object storage though.

The other place that stuff tends to go wrong is the dask graph. But again that’s not solved by rewriting the numpy functions that dask is calling.

IIUC there is little reason to think that rust is faster than numpy, because both use low-level languages under the hood.

But having said all that, a rust array library that was lazy, and hence internally had a query plan that could be optimized, could be much faster. The use case would be for when you don’t want the overhead of dask, but you also don’t want every step in the array computation to be eagerly computed in the way it likely would with numpy. There’s some possibility for crossover with Cubed’s ability to run on a single machine here to add a model of memory management, allowing for out-of-core. That is similar to the use case you’re describing.

Something you didn’t mention that could be powerful would be implementing any new xarray “Flexible Indexes” (e.g. RangeIndex) in rust.

As always any effort like this should start with profiling a simple case.

A couple of us have worked on GitHub - numbagg/numbagg: Fast N-dimensional aggregation functions with Numba to do exactly this. It’s wired up to most Xarray reductions and will be used if installed.

What computations are you actually running?

IIUC Numba is not good when you need to write a custom data structure for the computation you want (e.g. t-digest, sliding window median). This is something I’ve wanted to explore with rust but haven’t had a chance to yet.

EDITS: oops, i missed the “out-of-core” piece, for in-memory data, numbagg is great.

Thank you Tom & Deepak for your quick replies!

Yeah, you said it better than I did! This is exactly what I have in mind (sorry I didn’t articulate myself very well!)

And this is all very much inspired by Dask, Cubed, xarray, and the vision outlined by Tom N & Tom W in their Cubed blog post, where xarray could be thought of like SQL, and the heavy lifting can be deligated to specialised code.

(Although I should say that, right now, I don’t have the capacity to actually implement any of this! I’m just curious. Although, if folks think it could be useful, then I’ll definitely try to free up capacity to work on this…)

I think that what I have in mind is something that exposes a Python Array API, and so can be used in xarray as a duck array. The implementation would be 100% focused on processing out-of-core multi-dimensional arrays efficiently and quickly on a single machine by utilising all CPU cores, and submitting concurrent read requests to IO. As you say, Tom, it would internally create, optimise, and execute a query plan. It would interleave IO with compute. By default, it would keep intermediate arrays in RAM; unless this would result in running out of RAM. In which case it’d borrow from Cubed’s approach, and write intermediate data to disk.

Please may I ask a potentially very naive question!? I note that the API for alternative xarray chunked array types includes several methods which take Python functions as arguments (e.g. apply_gufunc(func: Callable, ...)). PyO3 allows Rust to call Python functions. But I’m guessing that calling Python functions from Rust might kill performance? So would we achieve better performance by “hiding” the chunking from xarray, by only exposing the Python Array API to xarray? (This “feels” wrong; but I don’t know enough about xarray to know for sure!) I’m guessing this would mean that the new Rust compute module would have to talk directly to IO (e.g. zarrs).

And, having said all this, it might be best to wait for Python’s free-threading to stabilise, so we can easily expoit all CPU cores from a single process, implemented entirely in Python code!

I think we’re actually very close to this already using Cubed. You can already run Cubed today on a single machine utilizing all cores (through threads or processes), and it internally creates a query plan that it optimizes.

By default, it would keep intermediate arrays in RAM; unless this would result in running out of RAM. In which case it’d borrow from Cubed’s approach, and write intermediate data to disk.

Doing the rechunk in-memory is the only part that’s missing. A while ago I raised an issue suggesting how Cubed could be generalized to do exactly that. One way to implement that idea might be to have some kind of special in-memory zarr store or implementation of the rechunk primitive that did the rechunking in memory in rust as fast as possible. You could potentially steal whatever algorithm it is that dask uses to do rechunks now. This could be a very impactful but still tightly-scoped thing for you to write.

In which case it’d borrow from Cubed’s approach, and write intermediate data to disk.

This is the same idea as the “memory hierarchy” analogy @tomwhite made in that issue I just linked. It could also alternatively be implemented by spilling to disk during big rechunks like dask now does.

But I’m guessing that calling Python functions from Rust might kill performance?

If you use the kwarg dask='allowed' (this is now badly-named, IIRC the same kwarg works the same way for Cubed too) then the responsibility of mapping the python function over the chunks is handed off to the function. In other words, you only call the python function once per apply_ufunc call, not once per chunk. This should mean any cost of calling rust from python would amortize with respect to scale.

“hiding” the chunking from xarray , by only exposing the Python Array API to xarray ?

You can also do this - that’s what Arkouda does - but I suspect here it will mean you end up needing to rewrite much of Dask/Cubed’s logic for handling chunked arrays in your rust library. I suggest starting with the rechunk approach I described above.

Also note that you don’t actually have to understand xarray’s API for alternative chunked array types or other internals in order to try out any of this. The MVP can be written entirely in terms of Cubed’s Array API, cubed.from_zarr, and cubed.rechunk - the wrapping by xarray isn’t the performance-critical part.