implementing any parts of xarray
’s computational machinery in Rust?
i.e. we wouldn’t need to “re-write” any of the core xarray code, instead we’d have new Rust code (with a Python API)
You could do that, but xarray’s flexible arrays means that you’re basically suggesting rewriting numpy in rust.
I’ve heard it said the xarray can be quite slow for some use-cases
Statements like this aren’t super informative because “xarray” is really xarray + a backend for understanding a format (e.g. h5netcdf/zarr-python) + a way to get at remote chunks (e.g. fsspec) + a DAG (e.g. dask/cubed) + an algorithm represented in the DAG (e.g. flox) + a DAG scheduler (dask.distributed
) the actual low-level computation (e.g. numpy/cupy).
I’ve also heard people say that the last thing the community needs is yet another framework for computation !
Interesting - IMO there should be several options for computation to promote competition, rather than relying on dask all the way down.
In most cases (certainly in cloud workflows) the actual bottleneck in xarray workflows is IO, not compute, which is why the rust core of Icechunk can deliver big speedups.
writing xarray backends in Rust?
If you rewrote an xarray backend in rust the best you could do would likely be to approach the performance of using VirtualiZarr with icechunk, as Icechunk with virtual chunks is effectively a general rust-powered async backend for reading any filetype that can be virtualized. This is focused on object storage though.
The other place that stuff tends to go wrong is the dask graph. But again that’s not solved by rewriting the numpy functions that dask is calling.
IIUC there is little reason to think that rust is faster than numpy, because both use low-level languages under the hood.
But having said all that, a rust array library that was lazy, and hence internally had a query plan that could be optimized, could be much faster. The use case would be for when you don’t want the overhead of dask, but you also don’t want every step in the array computation to be eagerly computed in the way it likely would with numpy. There’s some possibility for crossover with Cubed’s ability to run on a single machine here to add a model of memory management, allowing for out-of-core. That is similar to the use case you’re describing.
Something you didn’t mention that could be powerful would be implementing any new xarray “Flexible Indexes” (e.g. RangeIndex
) in rust.
As always any effort like this should start with profiling a simple case.