Xarray time-series, how to remove local outliers?

ZZMitch · July 10, 2024, 2:36pm

Thanks @keewis for the idea! I have not yet been able to implement it on my end - was struggling to get to the correct format for the categories variable. Although I did have skimage.measure.label functioning on test 2D arrays (without time dimension).

I did end up with an approach that is working for me based around xarray cumsum. This was inspired by python - Convert cumsum() output to binary array in xarray - Stack Overflow.

First, for value 1, calculate cumsum, but reset it each time 0 is found:

cumsum = cube.cumsum(dim = 'time') - cube.cumsum(dim = 'time').where(cube== 0).ffill(dim = 'time').fillna(0)

Next, find groups of value 1 that meet condition (thresh = 3 consecutive 1’ observations, skipping NaNs, all observations that are part of this group will be kept.):

grps1 = xr.full_like(cumsum, fill_value = 0) 
grps1 = xr.where(cumsum >= thresh, 1, grps1) 
grps1 = xr.where((cumsum > 0) & (cumsum < thresh), np.nan, grps1 )
grps1 = grps1 .bfill(dim = 'time')

Then, flip the input cube and repeat this process (i.e., find groups of value 0s - which for cumsum have been set to value 1):

cube_flip = xr.where(cube == 1, 0, cube)
cube_flip = xr.where(cube == 0, 1, cube_flip)

# Next, repeat above, with grps0

Finally, create a cleaned cube based on these groups, and replace NaNs from the original cube to get the desired result (outliers removed, but otherwise original observations/NaNs remain intact):

cube_clean = xr.where(grps1 == 1, 1, np.nan)
cube_clean = xr.where(grps0 == 1, 0, cube_clean)
cube_clean = xr.where(cube.isnull(), np.nan, cube_clean)

This is not as short and clean as the potential of skimage.measure.label and does not scale well if there are many groups that need to be cleaned, but is all dask-complient so runs pretty quick when data needs to be loaded into memory later.

Topic		Replies	Views
Xarray DataArray resample missing values result in zeros Science	2	477	September 2, 2021
Issue with Dask-xarray migration Data	2	750	October 5, 2020
Xarray quantile fails on groupby over daily data!	11	116	June 22, 2025
Efficiently slicing random windows for reduced xarray dataset	27	2219	August 5, 2022
High resolution time series; open_zarr question Science	3	1085	July 2, 2020

Xarray time-series, how to remove local outliers?

Related topics