Xarray quantile fails on groupby over daily data!

While working with daily climate data using ‘xarray’, I encountered a behavior when computing the 95th or 99th percentile per year using:


image

This used to work earlier, but now throws an error when applied to multi-year daily datasets.

I found that removing February 29 from the dataset (standardizing all years to 365 days) resolves the issue.

My questions:

Why does xarray fail here - is it due to inconsistent group lengths (365 vs 366 days)?
Was this always the intended behavior, or has something changed recently in xarray or dask that now causes this failure?
Is there a recommended way to handle leap years when using ‘.groupby().quantile()’?

Any insights or suggestions would be greatly appreciated.
Thanks!

Can you post a reproducible example please?

Or at least the repr for ds

Here’s the structure of ds.


I hope this is fine!

Can you expand the time array please? It looks like cftime is involved

This works for me (with & without flox) so can you try out this snippet please?

import dask.array
import xarray as xr

time = xr.date_range("1850-01-01", "2014-12-31", freq="D", use_cftime=True)
da = xr.DataArray(
   dask.array.ones((60265, 144, 192), chunks=(300, -1, -1)),
   dims=("time", "lat", "lon"),
   name="pr",
   coords={"time": time},
)
da.chunk(time=xr.groupers.TimeResampler("YS")).groupby("time.year").quantile(q=0.99, dim="time", skipna=True)

Thanks! I tried the exact snippet you shared, and I’m still getting the same error


The code sample works for me in a fresh environment, which to me indicates there’s something wrong with your environment.

Can you post the output of xr.show_versions(), please?

I still can’t reproduce, but I noticed that your environment seems a bit odd, as xarray=2025.4.0 (the version you supposedly have) requires matplotlib>=3.8 but you have matplotlib=3.4.3.

Could you try creating a new environment with

mamba create -n test python=3.13 xarray numpy flox dask cftime ipython

and then rerun the code sample Deepak posted? If that’s still failing I have no idea how to resolve that (besides tracking down import paths, which is really tricky), otherwise we know that it’s the environment that needs to be sorted out.

1 Like

Something is very weird. I don’t see why apply_ufunc is in the code path either. We shouldn’t be using it here.

Yes, it worked.

In [1]: import dask.array
   ...: import xarray as xr
   ...: 
   ...: time = xr.date_range("1850-01-01", "2014-12-31", freq="D", use_cftime=True)
   ...: da = xr.DataArray(
   ...:    dask.array.ones((60265, 144, 192), chunks=(300, -1, -1)),
   ...:    dims=("time", "lat", "lon"),
   ...:    name="pr",
   ...:    coords={"time": time},
   ...: )
   ...: da.chunk(time=xr.groupers.TimeResampler("YS")).groupby("time.year").quantile(q=0.99, dim="time", skipna=True)
Out[1]: 
<xarray.DataArray 'pr' (year: 165, lat: 144, lon: 192)> Size: 36MB
dask.array<transpose, shape=(165, 144, 192), dtype=float64, chunksize=(1, 144, 192), chunktype=numpy.ndarray>
Coordinates:
    quantile  float64 8B 0.99
  * year      (year) int64 1kB 1850 1851 1852 1853 1854 ... 2011 2012 2013 2014
Dimensions without coordinates: lat, lon

1 Like