Hi all,
I wanted to kick off a discussion about using the Pangeo ecosystem to handle GEDI data in HDF5 format, particularly L2A and L2B. See the “Layers”-tab for an overview of the file contents. Here is an example file if anyone wants to play around (link is valid until 2022-02-15).
Does anyone here have experience in opening GEDI data into Xarray? And potentially saving a subset as a Zarr file? Some of my experiences so far:
Opening a file using xr.open_dataset results in an empty xarray.Dataset
:
import xarray as xr
ds = xr.open_dataset(gedi_file_path)
The closest I’ve come to loading GEDI data is by specifying the h5netcdf
engine and a specific group of layers, which was referenced in the documentation here:
group = 'BEAM0101/land_cover_data'
ds = xr.open_dataset(gedi_file_path, engine='h5netcdf', **{'group': group, 'phony_dims': 'sort'})
The problem here is that I care most about some of the layers that are not stored in groups, e.g. rh100
in the following example, which is the height above ground of the received waveform signal start:
/
├── BEAM0101
│ ├── algorithmrun_flag (800,) uint8
│ ├── beam (800,) uint16
│ ├── channel (800,) uint8
│ ├── cover (800,) float32
│ ├── cover_z (800, 30) float32
│ ├── rh100 (800,) int16
│ ├── land_cover_data
│ │ ├── landsat_treecover (800,) float64
│ │ ├── landsat_water_persistence (800,) uint8
│ │ ├── leaf_off_doy (800,) int16
│ │ ├── leaf_off_flag (800,) uint8
│ │ ├── leaf_on_cycle (800,) uint8
| | ...
... ...
Regarding Zarr, I’ve tried to migrate a file as described here: Tutorial — zarr 2.13.3 documentation
Unfortunately that ends in TypeError: Object of type bytes_ is not JSON serializable
.
Any advice and shared experiences are more than welcome!