TLDR: We’d like to have insight from xarray advanced users/devs on the design decisions of our software. Problem: We want to do repeated interpolation of millions of points on top of xarray datasets. We can use vectorized interpolation (under “Advanced Interpolation” in the docs) of an array of points, which is very performant (similar to our previous version), but requires a different execution path than we currently have and also a mental shift for all our current users code. I just want to make sure that we have our bases covered before making this jump. What considerations should we have surrounding:
- data loading and caching - how does vector interpolation in xarray load and cache data for repeated interpolation on neighbouring points?
- how would chunking work (where our access patterns is interpolation heavy)
I work on Parcels (site, repo), a highly customisable Lagrangian simulator primarily used in oceanography. We’re currently working on a new major version of our software that is xarray native (as opposed to the old version which explicitly loaded and stored numpy arrays).
A 30s overview of Parcels:
- Users provide a “FieldSet”, i.e., hydrodynamical flow fields (previously as netcdf file paths, now as xarray datasets) which are lazily coerced and represented internally in a consistent format.
- Users define a “particleset” to reside in the flow field
- Users define “kernels”
- these are Python functions that define what the particle should do in response to the flow field. This could be “follow the flow”, or “follow the flow and slowly sink down”
- on the kernel level, we have access to the particle location, and are able to interpolate the flow field at the particle location. These interpolation code in Parcels works also in the curvilinear case[1], and (in version 4, with help from uxarray) the unstructured case.
 
With this functionality, Parcels is very flexible to model all sorts of physical phenomena (plastic transport, water transport, marine organisms, oil spills, icebergs etc etc).
Version 3 of Parcels
Version 3 of our software was a mix of Python, C, as well as custom tooling to compile Python functions to C code. It worked something like this:
- Users define a FieldSet
- Users define their kernels
- these kernels operate per particle (of which there can be millions - this is an important note)
- e.g., interpolate the U/V velocity at my location, add that to my lon/lat
 
 
- these kernels operate per particle (of which there can be millions - this is an important note)
- Users define their particles
- what variables they have, and what the initial conditions are for those variables
 
- Users execute their particles
- their Python kernels are compiled to C using a custom code transpiler. This creates a C execution loop which can be passed the field/grid data and the particle data as C structs. This execution loop goes through each particle in the particle data and feeds it through the kernel
- loop until end of simulation
- a timeslice of their fieldset is loaded and made into cstructs, then the C execution loop is run for some timesteps
- execution is passed back to Python for particle position writing
 
 
The above approach (dubbed “JIT mode” in our project) is quite performant compared to pure Python (which we were also able to run in, which was dubbed “Scipy mode”):
- JIT mode: most execution is done in C (including interpolation, and running of user code)
Difficulties:
- Custom kernel transpiling is bespoke tooling that has a maintenance burden, is slightly unintuitive, and also has more limited support than Numba (or just using Python functions)
- (more importantly) Interop layer with C makes clear distinction in the code between “Python world” and “C world”. This makes it impossible (?) to leverage xarray functionalities such as spatial chunking (we don’t know where particles are and we can’t dynamically request chunks). This creates IO bottlenecks, and also saturates memory limiting our temporal interpolation to linear (i.e., only loading a few timeslices). This IO bottleneck is intense since for each particle we’re mostly only interested in the the flow in the 1-neighbourhood of that particle.
aside: our benchmarking approach can be improved (we didn’t previously have a fleshed out benchmark suite to benchmark and profile the code). Once we have working and well designed code, a targeted eye on performance will help us squeeze out more performance.
Towards version 4
Regarding difficulties (1) and (2), we have a couple ideas for how they can be solved.
Using Numba for kernel compiling
This would resolve (1), allowing users to write more complex kernels, but I don’t know if it would help with (2). I don’t know what Xarray/Numba interoperability looks like, but AFAICT it would require loading the arrays before passing to Numba meaning that (2) is still a problem
Move to a vectorized approach
Rather than interpolating one particle at a time (in C), move towards interpolating and updating the whole particleset at a time. Keep everything in Python, and rely on xarray at the interpolation level.
Benefits:
- Free performance improvements as dependencies improve (e.g., xarray vectorized interp)
- Can spend dev hours contributing upstream to xarray to improve this
Drawback(?):
- users have to rewrite their kernels to be vectorized and deal with the whole particleset at a time. This breaks kernel backwards compatibility (a minor concern given its a major version, but something to consider)
- other drawbacks?
Questions
- data loading and caching - how does vector interpolation in xarray load and cache data for repeated interpolation on neighbouring points?
- how would chunking work (where our access patterns is interpolation heavy)
- are these identified difficulties accurate? Or are there any considerations that I’m missing here?
(interested to contribute to this problem? We’re sprinting online and in Utrecht during our anniversary event 1-3 Oct)
- We implement this by searching first for the particle in the grid, and computing its barycentric coordinates (highlighted in the Parcels v2.0 paper). Naively using xarray to interpolate doesn’t work as xarray doesn’t support interpolation on curvilinear grids Does interp() work on curvilinear grids (2D coordinates) ? · Issue #2281 · pydata/xarray · GitHub . ↩︎