Reading MODIS thermal anomalies into Xarray

I’d appreciate some help with loading MODIS thermal anomaly data into Xarray and am asking here because I’m not really sure where in the stack this question belongs.

Overview

MODIS/Terra Thermal Anomalies/Fire 5-Min L2 Swath 1km data includes a “Fire Pixel Table”, which includes information about fire pixels as 27 separate SDSs (HDF4 scientific dataset objects). If there are no fire pixels in the granule, these SDSs have len 0 dimensions and fail to load in both NetCDF4 and pyhdf, causing issues downstream with xarray.open_dataset. The HDF4 SDSs are variables, rather than groups, so using group="fire mask" in xr.open_dataset doesn’t work, nor does drop_variables becuase the backends fail before that kwarg is used.

Does anyone have a simple solution for ignoring these 27 extra variables, a different workaround, or a suggestion for where to raise an issue about this?

MVCE

import os

import earthaccess
from netCDF4 import Dataset

earthaccess.login()

# Download some MODIS/Terra Thermal Anomalies/Fire 5-Min L2 Swath 1km data (100 files)
results = earthaccess.search_data(
    concept_id="C2271754179-LPCLOUD", count=1, temporal=("2015-06-01", "2015-12-31")
)
file = earthaccess.download(results, "earthaccess_data")[0]

cwd = os.getcwd()
fh = Dataset(os.path.join(cwd, file), mode="r")
# Fire mask works
print(fh.variables["fire mask"].shape)
# FP_line (part of the fire pixel table) doesn't
print(fh.variables["FP_line"].shape)

Error:

Traceback (most recent call last):
  File "/Users/max/miniforge3/envs/modis-readers/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/max/miniforge3/envs/modis-readers/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/max/.vscode/extensions/ms-python.debugpy-2024.10.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/Users/max/Documents/Code/maxrjones/modis-readers/MVCE.py", line 17, in <module>
    print(fh.variables["FP_line"].shape)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 4563, in netCDF4._netCDF4.Variable.shape.__get__
  File "src/netCDF4/_netCDF4.pyx", line 3734, in netCDF4._netCDF4.Dimension.__len__
  File "src/netCDF4/_netCDF4.pyx", line 2113, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

Full demo

modis-readers/find-errors.ipynb at main · maxrjones/modis-readers · GitHub shows that NetCDF4 fails on all the granules that do not have any fire pixels.

1 Like

I don’t have much to offer, but I’m trying to follow along and I see different behaviour - can you please print out what you see with

print(fh)

fwiw, GDAL sees the three vars (dropping the 27 that are presumably 1D ?), and I can open them with rasterio using GDAL SDS syntax, but I can’t get xarray to open at all (afaik it would have to use GDAL via rioxarray/rasterio to do this).

Subdatasets:
  SUBDATASET_1_NAME=HDF4_SDS:UNKNOWN:"earthaccess_data/MOD14.A2015151.2355.061.2021323231035.hdf":0
  SUBDATASET_1_DESC=[2030x1354] fire mask (8-bit unsigned integer)
  SUBDATASET_2_NAME=HDF4_SDS:UNKNOWN:"earthaccess_data/MOD14.A2015151.2355.061.2021323231035.hdf":1
  SUBDATASET_2_DESC=[2030x1354] algorithm QA (32-bit unsigned integer)
  SUBDATASET_3_NAME=HDF4_SDS:UNKNOWN:"earthaccess_data/MOD14.A2015151.2355.061.2021323231035.hdf":29
  SUBDATASET_3_DESC=[20947x8] CMG_night (16-bit unsigned integer)

I don’t actually understand how you can open this at all without “engine = ‘rasterio’”, do you know what it’s using as the backend ?

I don’t actually understand how you can open this at all without “engine = ‘rasterio’”

note that this is using the netcdf4 library directly (from netCDF4 import Dataset), so no xarray involved and hence no engine kwarg.

Oh! Awesome thank you, when did netcdf lib start opening HDF4 ?? Lol, I’ll explore :pray::+1:

the conda-forge distribution of netCDF4 currently includes HDF4 support. the custom build instructions indicates the option was added in version 4.1.

The MVCE stripped the example down to only using the netcdf library without xarray, but at the bottom of this notebook you can see that xr.open_dataset(fp, engine="netcdf4") leads to the same issue.

Using import rioxarray; rioxarray.open_rasterio(fp) doesn’t fail, but also doesn’t ever load the Fire Pixel Table SDSs even if they do contain data.

1 Like

Cool ta, netcdf lib now includes some HDF4. That’s going to be interesting :smiley: I’ll get my build up to date so it’s not just through GDAL :ok_hand:

Does fh.variables["FP_line"][:] work? Either way, you’ll have to figure out how to get it into memory with Python before teaching Xarray to load it.

My experimental code sees three arrays in that file,

  • ‘fire mask’, chunks: [10, 1354], dtype: ‘u1’, dims: [2030, 1354]
  • ‘algorithm QA’, chunks: [10, 1354], dtype: ‘u4’, dims: [2030, 1354]
  • ‘CMG_night’, chunks: [2000, 8], dtype: ‘u2’, dims: [20947, 8]

‘FP_line’ (‘granule line of fire pixel’) has dims [0]. I think the embedded value is also 0 (but it’s confusing).

For making xarray understand the data correctly via any of the available interfaces, I’d say the important thing is, to know what the real values, the ground truth, are supposed to be!

Thanks, Martin. I believe that an empty array is the ground truth in this case. Since there are no fire pixels in the ‘fire mask’ array all of the vector arrays (e.g., FP_line) should be empty but all the interfaces get confused by this. To be honest, I’m not sure what the embedded value being 0 means.

Silly “add on” question. Is it possible to load directly to Xarray on Dask (without an intermediate step)?

@maxrjones you can pass the GDAL Subdataset path syntax that @Michael_Sumner shows above along with engine=‘rasterio’ to read a specific subdataset ignoring the others:

# HDF4_SDS:UNKNOWN:earthaccess_data/MOD14.A2015151.2355.061.2021323231035.hdf:0
gdal_subdataset = f'HDF4_SDS:UNKNOWN:{files[0]}:0'
xr.open_dataset(gdal_subdataset, engine='rasterio')

I’m not sure off the top of my head how to get those strings programmatically, but you’ll see them in the output of gdalinfo MOD14.A2015151.2355.061.2021323231035.hdf

1 Like

This is really helpful to know, thank you Scott for this advice and everyone for the helpful discussion!

@rlourenco do you mean for MODIS data in particular or in general? I think you’ll want to look at the chunks parameter in xarray.open_dataset for loading data into dask arrays.

1 Like

More in general (although we are using Sentinel 1 and 2 ).

I am majorly a HPC user, with offline data transfer. But some lab mates are willing to get data from Earth Engine, but as data transfer is capped, I was considering to change for a STAC repository and store in netCDF or Zarr (using a Google Cloud instance with Coiled)

Thanks for pointing out @maxrjones . I will try it later then.