Making kerchunk as simple as a toggle?

This, I think: hdf4doc/DSpec/html_FM/DS.pdf at master · HDFGroup/hdf4doc · GitHub

in case examples are useful, these are ones I’m exploring:

this one has matching GeoTIFF, it’s a very simple single-array case:
https://data.seaice.uni-bremen.de/amsr2/asi_daygrid_swath/s6250/2024/aug/Antarctic/

these are more complicated, with multiple 3D arrays (needs earthdata auth)

At the ESIP 2024 Summer Meeting, @jgallagher59701 from OPeNDAP gave a talk that included discussion about some of the the crazy stuff that goes on inside those NASA HDF4 files. Check out his talk VirtualiZarr and DMR++.

2 Likes

Thanks Rich! I should point out that the work was all done by Ayush Nag.
If anyone has questions about reading/decoding HDF4, I’ll try and help.

ah sweet, thanks very much :pray:

you might want to get in touch with @martindurant then, he appears to be building a metadata / chunk location reader in Maybe support hdf4 by martindurant · Pull Request #494 · fsspec/kerchunk · GitHub

1 Like

Yep, giving this a stab, and I can confirm that the internals of HDF4 are pure evil. That branch is not ready for anyone to use, more for those that want to actively help out. It does follow lined lists and find chunk tables, but doesn’t put things into groups (which array goes with which dimensions) yet and probably other stuff (how to INFLATE with zlib?).

3 Likes

@jgallagher59701 is there HDF4 metadata extraction/transformation code in DMR++ that could be used here?

1 Like

We have talked in the past about a dmr++ ↔ kerchunk bridge, but I don’t think there’s any work in that direction yet.

In principle, ~ v-zarr could ingest both ~ (this exists?) rather than converting to kerchunk, but that probably is about the same amount of work.

~ v-zarr could ingest both ~ (this exists?)

Virtualizarr can ingest kerchunk references, or directly from files. See this comment

Virtualizarr can also now ingest DMR++ (it just needs to be merged, probably upstream in earthaccess instead).

2 Likes

There is, on two levels. First, you can look at the DMR++ and the kinds of elements it uses to see various semantic details of HDF4 illustrated (e.g., a ‘chunk’ in hdf4 is not atomic, it can be made up of several discontiguous ‘blocks’ Yeah…).
On another level, you can look at the C++ code we have to see how we work with these files.
Bonus: To actually make most of NASA’s data usable, you need to use/understand the HDF4-EOS2 API since that is how the latitude and longitude values for the level 2 and 3 (swath and grid) data are computed.
Disclaimer: I am not the author of this code, so for the real details will take me some time to dig up.

2 Likes