Hello,
I am using geospatial packages from the open source Python ecosystem (e.g., stackstac, xarray, dask) to do some work and am stuck on a step so I thought I would make a post here. Please let me know if there is a better place to share these sorts of questions, thanks!
So, at this point in my workflow I have an xarray dataArray representing a tile in a larger area (77 timesteps, 2001 x 2001 pixels, float32 numpy.ndarray). It is a classification comprised of 1s (presence of a ground feature), 0s (absence of that feature, but still a ground observation), and NaNs (no ground observation, e.g., clouds).
I am interested in understanding things like how long the feature exists on the ground, when it arrives, when it leaves etc. However, before I can calculate this, I need to clean the time-series cube. Take this single pixel time-series, for example (with NaNs dropped for visibility):
The underlying time-series for this pixel looks like this (last few values shown):
'2018-06-15': 0.0,
'2018-06-18': 0.0,
'2018-06-20': nan,
'2018-06-21': nan,
'2018-06-23': 0.0,
'2018-06-25': nan,
'2018-06-28': nan,
'2018-06-30': 0.0,
'2018-07-08': nan,
'2018-07-10': 0.0,
'2018-07-15': 1.0,
'2018-07-18': nan,
'2018-07-23': 0.0,
'2018-07-25': nan,
'2018-07-28': nan,
'2018-07-30': nan
The single date 1-value spike on the right edge of the cube is a clear error in the classification that I want to correct before creating my outputs, leaving other values as they are.
To specify, I am looking for ways to correct (i.e., set as NaN or the opposite value) these types of outliers (i.e., where one or maybe two 0s/1s are surrounded by the other value in time). Note that I want to keep the other values as they are (i.e., cannot drop NaNs since each time-step has NaNs in different spots, it is important to maintain the observed start and end dates for the correctly classified portions for my later calculations). So essentially, I want look at each 0/1, compare it with its non-NaN neighbors and set it to NaN if its neighbors are the opposite value.
I have been playing around with various xarray options (e.g., rolling, interp, resample) but have not gotten the output I want. Maybe there is some means to find local outliers I can use with xarray? I generally struggle to do local time-series operations like this with xarray.
Thank you for your time!