Hi @rabernat
I’m a bit stubbory, and stick to the OpenDataCube way of working, and you’ve explained most of the differences above, but there are a few things that I would do differently.
Firstly, with odc.stac.load
there’s an option to pass in groupby="solar_day"
, which will merge adjacent scenes and do de-duplication of overlapping pixels. This is pretty important, as it merges satellite passes into single layers, removing the number of scenes to do the median from. I’m honestly not sure how stacstack handles this.
Second, with masking, we usually explicitly pick cloud and cloud shadow and then do a few operations to expand/contract that, to clean up single pixels and fill holes.
A simple example of this is this code, untested:
from odc.algo import erase_bad, mask_cleanup
CLOUD_SHADOWS = 3
CLOUD_HIGH_PROBABILITY = 9
for field in [CLOUD_SHADOWS, CLOUD_HIGH_PROBABILITY]:
bitmask |= 1 << field
cloud_mask = xr.SCL.astype("uint16") & bitmask != 0
cloud_mask = mask_cleanup(cloud_mask, [("closing", 5), ("opening", 5)])
final_data = erase_bad(xr, cloud_mask)
Note that the odc.algo tools there are important, as they retain the int
values in the data, which makes a difference when. you’re working with a lot of data. Using xarray.where converts data to float, which is nice because it means you can use nan
for nodata, but sometimes it’s unwanted.
This process is fairly well tested and rigorous, and we used it to do the cloud-free mosaics over Africa, for example. (With variations on the closing/opening logic for tweaking the cloud mask.)
A worked example of the above workflow is available here using HLS data.
And again, I wanted to note that I think there’s a gap in some of the EO tooling now in that we should have a reasonably simple, agreed way to mask clouds the right way, and since we really have two main optical products (Landsat and Sentinel-2), we can make a lot of the parameters for this opinionated.
And I’m looking for support to write this tooling! Maybe I should start a cloud masking crowd funding mini-project!