Applications and testers wanted! New Ramba distributed arrays through xarray

Todd_Anderson · November 24, 2021, 5:34pm

For the past year, Intel Labs has been developing a new Python distributed array package named Ramba. Ramba is (largely) a NumPy drop-in replacement in which distributed arrays operations are lazily evaluated, which allows Ramba to collect a series of operations, form a custom function from those operations, send that function to the nodes of the cluster where arrays segments are stored as large chunks, and to then compile/auto-parallelize that function using Numba and execute the compiled code on those chunks. The result of this approach is that in many cases Ramba performance is close (within 10%) to the performance of an equivalent C/MPI program. Moreover, Ramba can be hundreds of times faster than Dask distributed arrays and have better scaling. Recently, we did some work to get Ramba operating in the xarray context with some simple xarray examples. Now, we are looking to find a more significant application to drive our development and show the benefits of Ramba. So, if anyone reading this has an application that they think might benefit from Ramba (particularly those already using Dask distributed arrays), either directly or through xarray, and would like to work with us to try out Ramba, please let me know.

RichardScottOZ · November 28, 2021, 1:05am

Any type of cluster?

Todd_Anderson · November 28, 2021, 2:21am

Can you say more about what type of cluster you are interested in? Our system can do the launching of its internal remote tasks with Ray or with MPI so if either of those run on the cluster then I would assume that Ramba would work there but I’d need a bit more details about your cluster to give you a more definitive answer. We’ve mostly tested on a Linux cluster of 32+ nodes with 112 cores per node (two sockets per node) connected by some high-speed network like 10GbE or Infiniband. We have some basic NUMA support built-in for such environments.

thanks,

Todd

RichardScottOZ · November 28, 2021, 9:54pm

Lots of Kubernetes clusters in Pangeo type things, for example.

Todd_Anderson · November 29, 2021, 7:16pm

Are you currently using Dask within xarray in your Kubernetes cluster? If so, then the configuration you need to define your Dask cluster would be similar (in terms of content, not syntax) to what you would have to do to create a MPI or Ray cluster in order to use Ramba within xarray. If you don’t prelaunch a cluster then Ramba will just run on the current node so in some circumstances that is an option if data sizes are small enough. At the moment, Dask is probably more heterogeneous friendly than Ramba so for best results there might be a bit of additional effort to make sure that the cluster definition is sufficiently homogeneous. So, we do believe Ramba can be made to run in Kubernetes.

Topic		Replies	Views
New Working Group for Distributed Array Computing News & Announcements	58	4243	February 3, 2025
Numba guvectorize and distributed	4	778	August 18, 2020
Large-scale data processing benchmarks for Xarray-Beam	6	1587	June 13, 2022
Pangeo Showcase: "Arkouda as an XArray backend for HPC!" Pangeo Showcase	3	236	December 4, 2024
Pangeo Showcase: "Dask Array: Scaling Up for Terabyte-Level Performance" (April 9, 2025 at 12 PM ET) Pangeo Showcase	3	324	April 9, 2025

Applications and testers wanted! New Ramba distributed arrays through xarray

Related topics