Apply_ufunc with varying number of outputs

Hi everyone,

I am trying to use apply_ufunc for a ds (lat, lon, depth, time). I have done it in the past with simple functions but now I am struggling with scipy.signal.argrelextrema. I want to find the local maxima/minima over depth for the entire dataset.

argrelextrema gives a tuple of ndarrays even for 1D data, so for one water column profile I would have:

data = 283 + 5 * np.random.randn(5, 3, 4, 10)
var = xr.DataArray(data, dims=['time', 'lat', 'lon', 'depth'])

maxidx = argrelextrema(var.isel(time=0, lat=0, lon=0).values, np.greater)[0]

I am trying to extend this for the entire dataset to avoid looping through time, lat and lot (I’ve done the looping but it’s taking too long).

I am not sure if it’s even feasible with applu_ufunc given that the length of maxidx would be different for all points. Here is my try:

maxidx = xr.apply_ufunc(argrelextrema, var, kwargs=dict(comparator=np.greater, axis=-1),
                                    input_core_dims=[["depth"]], dask='parallelized', output_core_dims=[[]],
                                 dask_gufunc_kwargs={'allow_rechunk': True})

but I am getting the error:

{ValueError}ValueError(“applied function returned data with an unexpected number of dimensions. Received 0 dimension(s) but expected 3 dim…, 3, 5, 2, 5, 4, 6, 5, 7, 2, 4, 7, 2, 5, 8, 2, 7, 3, 5, 8,\n 1, 3, 5, 7, 5, 8, 2, 4, 8, 2, 6, 3, 5, 7])), dtype=object)”)

I don’t know how to specify that I want the 0th position in the tuple (so the ndarray) and how to assing that to a new dimension that takes the length of the largest???

Any thoughts/ideas? Thanks!

1 Like

Hi @andreall

(This is a pure-xarray usage question so raising this on the xarray discussions as a usage question you might get more input.)

You can get closer…

Your function works element-wise on non-core dimensions so you need vectorize=True. You also don’t want allow_rechunk.

I don’t know how to specify that I want the 0th position in the tuple (so the ndarray)

Easiest way would be to make a small wrapper function like

def argrelextrema_ufunc(arr: np.ndarray, comparator=np.greater, axis) -> np.ndarray:
    return argrelextrema(arr, comparator, axis)[0]

and call that with apply_ufunc.

But all of that doesn’t matter because I don’t think this function meets the definition of a numpy ufunc. Without some kind of padding, you can’t construct the result of application of the function as a single (or even as a fixed number of) ND numpy arrays. The results would technically be “ragged” arrays instead.

how to assing that to a new dimension that takes the length of the largest???

AFAIK there is no way to do this with apply_ufunc, because you’re trying to apply something that’s not a ufunc.

Padding all results to be the same length along the new dimension is one possible solution, and others might have other ideas.

1 Like

Thanks a lot! In truth, this is part of a larger function and actually at the end I don’t need all the local maxima, but just how many and the shallowest and deepest value so those are just new variables in the ds indexed in (time, lat, lon)…so maybe I can apply the ufunc on the upper level function with vectorize. I tried doing it but didn’t managed so I went to the basic functions trying to “convert” those to xarray. I will try your advice and if not will raise an issue directly on xarray. Thanks again.

yeah, I believe that’s your only option without using ragged (and ragged doesn’t support reshaping yet, so not sure how to do that).

implementation attempt with padding
import xarray as xr
import numpy as np
from scipy.signal import argrelextrema

data = 283 + 5 * np.random.randn(5, 3, 4, 10)
var = xr.DataArray(data, dims=["time", "lat", "lon", "depth"])


def f(arr, comparator, axis):
    result = argrelextrema(arr, comparator, axis)[0]
    if result.size < arr.size:
        result = np.pad(
            result,
            pad_width=(0, arr.size - result.size),
            mode="constant",
            constant_values=-1,
        )
    return result


xr.apply_ufunc(
    f,
    var,
    kwargs=dict(comparator=np.greater, axis=-1),
    input_core_dims=[["depth"]],
    dask="parallelized",
    output_core_dims=[["extrema"]],
    vectorize=True,
)

If you only need those four values (number of local maxima, number of local minima, shallowest value, deepest value), then that can be your dimension and you can use that as the consistently-sized output of your ufunc.

If I understand the desired end result correctly, something like this then -

import xarray as xr
import numpy as np
from scipy.signal import argrelextrema

data = 283 + 5 * np.random.randn(5, 3, 4, 10)
var = xr.DataArray(data, dims=["time", "lat", "lon", "depth"])

def deep_shallow_ufunc(arr, axis):
    maxidxs = argrelextrema(arr, np.greater, axis=axis)[0]
    maxvals = arr[maxidxs]
    deepest_val = np.max(maxvals, axis=axis)

    minidxs = argrelextrema(arr, np.less, axis=axis)[0]
    minvals = arr[minidxs]
    shallowest_val = np.min(minvals, axis=axis)

    num_maxima = maxidxs.shape[axis]
    num_minima = minidxs.shape[axis]

    return np.array([deepest_val, shallowest_val, num_maxima, num_minima])

extrema = xr.apply_ufunc(
    deep_shallow_ufunc,
    var,
    kwargs={"axis": -1},
    input_core_dims=[["depth"]],
    dask="parallelized",
    output_core_dims=[["extrema_info"]],
    vectorize=True,
)

extrema["extrema_info"] = ["deepest", "shallowest", "num_maxima", "num_minima"]

# and then to get each result array as its own variable, just select them
deepest_value = extrema.sel(extrema_info="deepest")
shallowest_value = extrema.sel(extrema_info="shallowest")
total_rel_maxima = extrema.sel(extrema_info="num_maxima")
total_rel_minima = extrema.sel(extrema_info="num_minima")

I’m only just recently getting into apply_ufunc myself though, so maybe I’m missing something here that is suboptimal.

3 Likes

if the function would return scalars (I totally missed that before), this can be further simplified:

import xarray as xr
import numpy as np
from scipy.signal import argrelextrema

data = 283 + 5 * np.random.randn(5, 3, 4, 10)
var = xr.DataArray(data, dims=["time", "lat", "lon", "depth"])

def deep_shallow_ufunc(arr, axis):
    maxidxs = argrelextrema(arr, np.greater, axis=axis)[0]
    maxvals = arr[maxidxs]
    deepest_val = np.max(maxvals, axis=axis)

    minidxs = argrelextrema(arr, np.less, axis=axis)[0]
    minvals = arr[minidxs]
    shallowest_val = np.min(minvals, axis=axis)

    num_maxima = maxidxs.shape[axis]
    num_minima = minidxs.shape[axis]

    return deepest_val, shallowest_val, num_maxima, num_minima

deepest_val, shallowest_val, num_maxima, num_minima = xr.apply_ufunc(
    deep_shallow_ufunc,
    var,
    kwargs={"axis": -1},
    input_core_dims=[["depth"]],
    dask="parallelized",
    output_core_dims=[[], [], [], []],
    vectorize=True,
)
3 Likes