Usage of xhistogram compared to np.digitize

I have recently been going though a course on geospatial python which uses xarray and rioxarray :grinning:

There are a few times in the course the authors move to numpy functions. Obviously there is nothing wrong with that. I would just like to make awareness of xarray functionality or packages in the pangeo stack that may do something similar and stay within an xarray/dask framework for scalability. i.e. I found an example of where the authors could use quantile with an xarray object over using np.percentile on the values of the arrays (use da.quantile in notebook 05 · Issue #43 · carpentries-incubator/geospatial-python · GitHub)

Similarly, i’m curious if xhistogram (which leverages functionally of xarray and dask) can be used over np.digitize.

np.digitize is used in this lesson Raster Calculations in Python – Introduction to Geospatial Raster and Vector Data with Python down in the section on “Classifying Continuous Rasters in Python”.

Rather than using the data in the course here is an MCVE:

# numpy version
import numpy as np
arr = np.random.lognormal(mean=1, sigma=1.0, size=(1, 1367, 1697)) -1
bins = [-1, 2, 10, 20, np.inf]
expected = np.digitize(nparry, bins)

# xhistogram version
import xarray as xr
da = xr.DataArray(arr, name="foo")
actual = ...values

# Check both give same result
import numpy.testing as npt
npt.assert_almost_equal(expected, actual)

Quantile and percentile have dask limitations and multidimensional limitations, perhaps?