Why learn rust? A pangeo perspective

TomNicholas · March 15, 2025, 11:02pm

I think we’re actually very close to this already using Cubed. You can already run Cubed today on a single machine utilizing all cores (through threads or processes), and it internally creates a query plan that it optimizes.

By default, it would keep intermediate arrays in RAM; unless this would result in running out of RAM. In which case it’d borrow from Cubed’s approach, and write intermediate data to disk.

Doing the rechunk in-memory is the only part that’s missing. A while ago I raised an issue suggesting how Cubed could be generalized to do exactly that. One way to implement that idea might be to have some kind of special in-memory zarr store or implementation of the rechunk primitive that did the rechunking in memory in rust as fast as possible. You could potentially steal whatever algorithm it is that dask uses to do rechunks now. This could be a very impactful but still tightly-scoped thing for you to write.

In which case it’d borrow from Cubed’s approach, and write intermediate data to disk.

This is the same idea as the “memory hierarchy” analogy @tomwhite made in that issue I just linked. It could also alternatively be implemented by spilling to disk during big rechunks like dask now does.

But I’m guessing that calling Python functions from Rust might kill performance?

If you use the kwarg dask='allowed' (this is now badly-named, IIRC the same kwarg works the same way for Cubed too) then the responsibility of mapping the python function over the chunks is handed off to the function. In other words, you only call the python function once per apply_ufunc call, not once per chunk. This should mean any cost of calling rust from python would amortize with respect to scale.

“hiding” the chunking from xarray , by only exposing the Python Array API to xarray ?

You can also do this - that’s what Arkouda does - but I suspect here it will mean you end up needing to rewrite much of Dask/Cubed’s logic for handling chunked arrays in your rust library. I suggest starting with the rechunk approach I described above.

Also note that you don’t actually have to understand xarray’s API for alternative chunked array types or other internals in order to try out any of this. The MVP can be written entirely in terms of Cubed’s Array API, cubed.from_zarr, and cubed.rechunk - the wrapping by xarray isn’t the performance-critical part.

Topic		Replies	Views
Best practices to go from 1000s of netcdf files to analyses on a HPC cluster? HPC	45	17428	July 5, 2025
What's Next - Software - Massive Scale	7	650	December 21, 2023
Extremely slow rechunking of Zarr store with xarray Data	16	3997	October 22, 2021
Optimizing climatology calculation with Xarray and Dask Science	33	4007	December 6, 2024
Struggling with large dataset loading/reading using xarray Science	39	15366	February 16, 2023

Why learn rust? A pangeo perspective

Related topics