Applications and testers wanted! New Ramba distributed arrays through xarray

For the past year, Intel Labs has been developing a new Python distributed array package named Ramba. Ramba is (largely) a NumPy drop-in replacement in which distributed arrays operations are lazily evaluated, which allows Ramba to collect a series of operations, form a custom function from those operations, send that function to the nodes of the cluster where arrays segments are stored as large chunks, and to then compile/auto-parallelize that function using Numba and execute the compiled code on those chunks. The result of this approach is that in many cases Ramba performance is close (within 10%) to the performance of an equivalent C/MPI program. Moreover, Ramba can be hundreds of times faster than Dask distributed arrays and have better scaling. Recently, we did some work to get Ramba operating in the xarray context with some simple xarray examples. Now, we are looking to find a more significant application to drive our development and show the benefits of Ramba. So, if anyone reading this has an application that they think might benefit from Ramba (particularly those already using Dask distributed arrays), either directly or through xarray, and would like to work with us to try out Ramba, please let me know.

1 Like

Any type of cluster?

Can you say more about what type of cluster you are interested in? Our system can do the launching of its internal remote tasks with Ray or with MPI so if either of those run on the cluster then I would assume that Ramba would work there but I’d need a bit more details about your cluster to give you a more definitive answer. We’ve mostly tested on a Linux cluster of 32+ nodes with 112 cores per node (two sockets per node) connected by some high-speed network like 10GbE or Infiniband. We have some basic NUMA support built-in for such environments.



Lots of Kubernetes clusters in Pangeo type things, for example.

Are you currently using Dask within xarray in your Kubernetes cluster? If so, then the configuration you need to define your Dask cluster would be similar (in terms of content, not syntax) to what you would have to do to create a MPI or Ray cluster in order to use Ramba within xarray. If you don’t prelaunch a cluster then Ramba will just run on the current node so in some circumstances that is an option if data sizes are small enough. At the moment, Dask is probably more heterogeneous friendly than Ramba so for best results there might be a bit of additional effort to make sure that the cluster definition is sufficiently homogeneous. So, we do believe Ramba can be made to run in Kubernetes.

1 Like