Failed to open MODIS .hdf4 files

Hello everyone!
Second day in a row I’m trying to read .hdf4 MODIS products with no success. I use open_rasterio function from rioxarray library in Python and get following error: “not recognized as a supported file format”.

data = rioxarray.open_rasterio('MYD09A1.A2018361.h20v03.006.2019010171535.hdf',  masked=True)

I’ve tried this command with no results as well:
pip3 install rasterio --force-reinstall --no-binary rasterio

I’d also found out that I try to open an hdf4 file from the command line through $gdalinfo, I got the same error. Then I tried to update gdal. Nothing changes. As I understand gdal and rioxarray libraries are somehow connect, so may be solution is here.

Does anyone know what’s wrong? It’s my first time working with such files, so I have no other choice, but to ask here.
My OS: Fedora 35
Gdal version - 3.3.2 (3.3 doesn’t work as well)

1 Like

Can you post the full traceback you see?

And can you share how you’re accessing the data? I’m able to read the MODIS HDF files from the Planetary Computer (these are version 0.6.1, rather than 0.6.0emphasized text)

import xarray as xr
import planetary_computer
import pystac
import urllib.request
import rioxarray

item = planetary_computer.sign(
    pystac.read_file("https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-09A1-061/items/MYD09A1.A2022321.h35v10.061.2022330082933")
)
filename, _ = urllib.request.urlretrieve(item.assets["hdf"].href)
print(rioxarray.open_rasterio(filename))

which outputs

<xarray.Dataset>
Dimensions:               (band: 1, x: 2400, y: 2400)
Coordinates:
  * band                  (band) int64 1
  * x                     (x) float64 1.89e+07 1.89e+07 ... 2.001e+07 2.001e+07
  * y                     (y) float64 -1.112e+06 -1.113e+06 ... -2.224e+06
    spatial_ref           int64 0
Data variables: (12/13)
    sur_refl_b01          (band, y, x) int16 ...
    sur_refl_vzen         (band, y, x) int16 ...
    sur_refl_raz          (band, y, x) int16 ...
    sur_refl_state_500m   (band, y, x) uint16 ...
    sur_refl_day_of_year  (band, y, x) uint16 ...
    sur_refl_b02          (band, y, x) int16 ...
    ...                    ...
    sur_refl_b04          (band, y, x) int16 ...
    sur_refl_b05          (band, y, x) int16 ...
    sur_refl_b06          (band, y, x) int16 ...
    sur_refl_b07          (band, y, x) int16 ...
    sur_refl_qc_500m      (band, y, x) uint32 ...
    sur_refl_szen         (band, y, x) int16 ...
Attributes: (12/134)
    ASSOCIATEDINSTRUMENTSHORTNAME.1:    MODIS
    ASSOCIATEDPLATFORMSHORTNAME.1:      Aqua
    ASSOCIATEDSENSORSHORTNAME.1:        MODIS
    AUTOMATICQUALITYFLAG.1:             Passed
    AUTOMATICQUALITYFLAGEXPLANATION.1:  Always Passed
    CHARACTERISTICBINANGULARSIZE250M:   7.5
    ...                                 ...
    SPSOPARAMETERS:                     2015
    SYSTEMFILENAME:                     MYD09GQ.A2022321.h35v10.061.202232303...
    TileID:                             51035010
    VERSIONID:                          61
    VERTICALTILENUMBER:                 10
    WESTBOUNDINGCOORDINATE:             172.622524004598
2 Likes

Yes, sure, this is the traceback:

KeyError                                  Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py:209, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    208 try:
--> 209     file = self._cache[self._key]
    210 except KeyError:

File ~/.local/lib/python3.10/site-packages/xarray/backends/lru_cache.py:55, in LRUCache.__getitem__(self, key)
     54 with self._lock:
---> 55     value = self._cache[key]
     56     self._cache.move_to_end(key)

KeyError: [<function open at 0x7f7889eace50>, ('MYD09A1.A2018361.h20v03.006.2019010171535.hdf',), 'r', (('engine', 'rasterio'), ('sharing', False)), '7f094700-6d53-41a7-9cb8-0e7015ae5031']

During handling of the above exception, another exception occurred:

CPLE_OpenFailedError                      Traceback (most recent call last)
File rasterio/_base.pyx:307, in rasterio._base.DatasetBase.__init__()

File rasterio/_base.pyx:218, in rasterio._base.open_dataset()

File rasterio/_err.pyx:221, in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: 'MYD09A1.A2018361.h20v03.006.2019010171535.hdf' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
Input In [138], in <cell line: 1>()
----> 1 data = rioxarray.open_rasterio('MYD09A1.A2018361.h20v03.006.2019010171535.hdf', engine = 'rasterio', masked=True)

File ~/.local/lib/python3.10/site-packages/rioxarray/_io.py:1087, in open_rasterio(filename, parse_coordinates, chunks, cache, lock, masked, mask_and_scale, variable, group, default_name, decode_times, decode_timedelta, band_as_variable, **open_kwargs)
   1085     else:
   1086         manager = URIManager(file_opener, filename, mode="r", kwargs=open_kwargs)
-> 1087     riods = manager.acquire()
   1088     captured_warnings = rio_warnings.copy()
   1090 if band_as_variable:

File ~/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py:191, in CachingFileManager.acquire(self, needs_lock)
    176 def acquire(self, needs_lock=True):
    177     """Acquire a file object from the manager.
    178 
    179     A new file is only opened if it has expired from the
   (...)
    189         An open file object, as returned by ``opener(*args, **kwargs)``.
    190     """
--> 191     file, _ = self._acquire_with_cache_info(needs_lock)
    192     return file

File ~/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py:215, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    213     kwargs = kwargs.copy()
    214     kwargs["mode"] = self._mode
--> 215 file = self._opener(*self._args, **kwargs)
    216 if self._mode == "w":
    217     # ensure file doesn't get overridden when opened again
    218     self._mode = "a"

File ~/.local/lib/python3.10/site-packages/rasterio/env.py:444, in ensure_env_with_credentials.<locals>.wrapper(*args, **kwds)
    441     session = DummySession()
    443 with env_ctor(session=session):
--> 444     return f(*args, **kwds)

File ~/.local/lib/python3.10/site-packages/rasterio/__init__.py:304, in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    301 path = _parse_path(raw_dataset_path)
    303 if mode == "r":
--> 304     dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    305 elif mode == "r+":
    306     dataset = get_writer_for_path(path, driver=driver)(
    307         path, mode, driver=driver, sharing=sharing, **kwargs
    308     )

File rasterio/_base.pyx:309, in rasterio._base.DatasetBase.__init__()

RasterioIOError: 'MYD09A1.A2018361.h20v03.006.2019010171535.hdf' not recognized as a supported file format.

I’m accessing the data using modis_tools library:

from modis_tools.auth import ModisSession
from modis_tools.resources import CollectionApi, GranuleApi
from modis_tools.granule_handler import GranuleHandler

username = ""
password = ""

session = ModisSession(username=username, password=password)
collection_client = CollectionApi(username=username, password=password)

collections = collection_client.query(short_name="MYD09A1") 

granule_client = GranuleApi.from_collection(collections[0], session=session)

coordinates = [57.024, 57.043, 59.529, 59.578]
granules = granule_client.query(start_date="2018-12-30", end_date="2018-12-31", bounding_box = coordinates)

GranuleHandler.download_from_granules(granules, session)

Hmmm, it seems the problem is due to the way I get the file, because your script works for me fine, it’s strange…

1 Like

It’s also possible that the NetCDF file from USGS is somehow corrupted. I think the HDF-EOS format they use is supposed to be compatible with HDF, and so all the files should be readable by rasterio / GDAL, but that might be worth verifying too.

1 Like

The point is that GDAL rejects opening these HDF-EOS files even from the command line. However, other library (pyhdf) reads them just fine . So I suppose the problem root is in GDAL itself…

We have a local copy of that file, and I can open it without problems with “raw” GDAL (gdal.VersionInfo() =‘3050000’), and with rioxarray (v0.12.2). If gdalinfo on the file on the console doesn’t work, rioxarray won’t work either, so try that first:

$ gdalinfo MYD09A1.A2018361.h20v03.006.2019010171535.hdf
Driver: HDF4/Hierarchical Data Format Release 4
Files: MYD09A1.A2018361.h20v03.006.2019010171535.hdf
Size is 512, 512
Metadata:
  ASSOCIATEDINSTRUMENTSHORTNAME.1=MODIS
  ASSOCIATEDPLATFORMSHORTNAME.1=Aqua
  ASSOCIATEDSENSORSHORTNAME.1=MODIS
[...]
Subdatasets:
  SUBDATASET_1_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b01
  SUBDATASET_1_DESC=[2400x2400] sur_refl_b01 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_2_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b02
  SUBDATASET_2_DESC=[2400x2400] sur_refl_b02 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_3_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b03
  SUBDATASET_3_DESC=[2400x2400] sur_refl_b03 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_4_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b04
  SUBDATASET_4_DESC=[2400x2400] sur_refl_b04 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_5_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b05
  SUBDATASET_5_DESC=[2400x2400] sur_refl_b05 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_6_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b06
  SUBDATASET_6_DESC=[2400x2400] sur_refl_b06 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_7_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_b07
  SUBDATASET_7_DESC=[2400x2400] sur_refl_b07 MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_8_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_qc_500m
  SUBDATASET_8_DESC=[2400x2400] sur_refl_qc_500m MOD_Grid_500m_Surface_Reflectance (32-bit unsigned integer)
  SUBDATASET_9_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_szen
  SUBDATASET_9_DESC=[2400x2400] sur_refl_szen MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_10_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_vzen
  SUBDATASET_10_DESC=[2400x2400] sur_refl_vzen MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_11_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_raz
  SUBDATASET_11_DESC=[2400x2400] sur_refl_raz MOD_Grid_500m_Surface_Reflectance (16-bit integer)
  SUBDATASET_12_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_state_500m
  SUBDATASET_12_DESC=[2400x2400] sur_refl_state_500m MOD_Grid_500m_Surface_Reflectance (16-bit unsigned integer)
  SUBDATASET_13_NAME=HDF4_EOS:EOS_GRID:"MYD09A1.A2018361.h20v03.006.2019010171535.hdf":MOD_Grid_500m_Surface_Reflectance:sur_refl_day_of_year
  SUBDATASET_13_DESC=[2400x2400] sur_refl_day_of_year MOD_Grid_500m_Surface_Reflectance (16-bit unsigned integer)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)
1 Like

That was point, GDAL didn’t open the file through gdalinfo, and I guessed that because of that rioxarray didn’t read the file too.
I ended up using pyhdf library, which worked fine…