Feedback on new qa mask python package

Hi all. Recently I was doing some work with modis data and spent a lot of time figuring out the QA bands which are packed into bits. There are a few packages which deal with this but are pretty limited, and every tutorial I found on it in python essentially just did it manually. So I put together a package to deal with this which I tried to make as straightforward as possible.

You all do a lot of remote sensing & python stuff so I was hoping I could get some feedback on it and any suggestions and/or improvements. The package and docs are here:

https://sdtaylor.github.io/unpackqa

It’s currently limited to just some modis and landsat QA bands, but additions should be pretty easy once I’m confident in everything else. There is also the capability of specifying how bit flags are organized manually.

1 Like

Very cool! Thanks so much for sharing–looks super useful.

Here is a little feedback as requested. A lot of folks in the Pangeo community really like to work with raster data via Xarray. Here’s an example of how you would open that same data using Xarray (plus pooch for download):

import xarray as xr
import pooch

L8_QA_PIXEL_FILE = 'https://landsat.usgs.gov/sites/default/files/C2_Sample_Data/LC08_L1TP_140041_20130503_20200912_02_T1.zip'

file_path = pooch.retrieve(
    L8_QA_PIXEL_FILE,
    known_hash=None,
    processor=pooch.Unzip()
)
my_file = [f for f in file_path if 'PIXEL' in f][0]
ds = xr.open_dataset(my_file)
print(ds)
<xarray.Dataset>
Dimensions:      (band: 1, x: 7551, y: 7351)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 3.708e+05 3.708e+05 ... 5.973e+05 5.973e+05
  * y            (y) float64 3.144e+06 3.144e+06 ... 2.924e+06 2.924e+06
    spatial_ref  int64 0
Data variables:
    band_data    (band, y, x) float32 ...
ds.band_data.coarsen(x=10, y=10, boundary='pad').mean().plot()

It would be great if your documentation showed how to use unpackqa with Xarray data, perhaps via apply_ufunc.

(As a bonus, your documentation could reference files based on globally accessible URLs [as in my example], rather than private file paths. This would make it easier for anyone to run the examples on their own computer.)

Thanks Ryan! I’ve used apply_ufunc before so it should be doable. Quick question, how did you get xr.open_dataset to read a tif file? I’ve only done that with xr.open_rasterio.

In the latest versions of xarray, you can do open_dataset(..., engine='rasterio'). It will also automatically resolve the .TIF file extension to the rasterio engine.

1 Like

The stackstac package is considering including a utility that does some of this functionality… Here is the link to an open PR - `expand_bits` function by gjoseph92 · Pull Request #41 · gjoseph92/stackstac · GitHub that might be helpful!

Open Data Cube should have some work along these lines

So here is a full example using xarray with apply_ufunc

import xarray as xr
import numpy as np
import pooch

import unpackqa

L8_QA_PIXEL_FILE = 'https://landsat.usgs.gov/sites/default/files/C2_Sample_Data/LC08_L1TP_140041_20130503_20200912_02_T1.zip'
L8_qa_product = 'LANDSAT_8_C2_L2_QAPixel'

file_path = pooch.retrieve(
    L8_QA_PIXEL_FILE,
    known_hash=None,
    processor=pooch.Unzip()
)

my_file = [f for f in file_path if 'QA_PIXEL' in f][0]
ds = xr.open_rasterio(my_file, chunks={'x':2000,'y':2000})

# Drop the band axis since the file has a single band.
ds = ds.isel(band=0)

#-------------------------------
# with unpack_to_array
# resulting in flags as an added dimension
L8_flag_names = unpackqa.list_qa_flags(L8_qa_product)

flag_ds = xr.apply_ufunc(
    unpackqa.unpack_to_array,
    ds,
    kwargs = dict(product=L8_qa_product),
    output_core_dims = [['flag']],
    dask_gufunc_kwargs = dict(output_sizes={'flag':len(L8_flag_names)}),
    output_dtypes=[np.uint8],
    vectorize=False,
    dask='parallelized'
    )

# put labels on the flag coordinates
flag_ds['flag'] = L8_flag_names

# execute all lazy computations and load full dataset into memory.
flag_ds.load()

# Another option would be to have each flag as it's own data variable.
# That can be done with some rearranging.
flag_ds2 = [flag_ds.sel(flag=flag_name).drop('flag').rename(flag_name) for flag_name in L8_flag_names]
flag_ds2 = xr.merge(flag_ds2)

The best way to do it is using unpack_to_array inside the apply_ufunc function. Which adds an extra dimension for the flags.

flag_ds
<xarray.DataArray (y: 7351, x: 7551, flag: 12)>
array([[[1, 0, 0, ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0],
...
        [1, 0, 0, ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0],
        [1, 0, 0, ..., 0, 0, 0]]], dtype=uint8)
Coordinates:
    band     int64 1
  * y        (y) float64 3.144e+06 3.144e+06 3.144e+06 ... 2.924e+06 2.924e+06
  * x        (x) float64 3.708e+05 3.708e+05 3.709e+05 ... 5.973e+05 5.973e+05
  * flag     (flag) <U23 'Fill' 'Dilated_Cloud' ... 'Cirrus_Confidence'

Alternatively, it might be nice to have the flags as distinct variables which can be referenced. Which ends up like:

flag_ds2
<xarray.Dataset>
Dimensions:                  (y: 7351, x: 7551)
Coordinates:
    band                     int64 1
  * y                        (y) float64 3.144e+06 3.144e+06 ... 2.924e+06
  * x                        (x) float64 3.708e+05 3.708e+05 ... 5.973e+05
Data variables:
    Fill                     (y, x) uint8 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    Dilated_Cloud            (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cirrus                   (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cloud                    (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cloud_Shadow             (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Snow                     (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Clear                    (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Water                    (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cloud_Confidence         (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cloud_Shadow_Confidence  (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Snow_Ice_Confidence      (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    Cirrus_Confidence        (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
1 Like