Experience with GEDI L2A/B data (HDF5 format)

Hi all,

I wanted to kick off a discussion about using the Pangeo ecosystem to handle GEDI data in HDF5 format, particularly L2A and L2B. See the “Layers”-tab for an overview of the file contents. Here is an example file if anyone wants to play around (link is valid until 2022-02-15).

Does anyone here have experience in opening GEDI data into Xarray? And potentially saving a subset as a Zarr file? Some of my experiences so far:

Opening a file using xr.open_dataset results in an empty xarray.Dataset:

import xarray as xr
ds = xr.open_dataset(gedi_file_path)

The closest I’ve come to loading GEDI data is by specifying the h5netcdf engine and a specific group of layers, which was referenced in the documentation here:

group = 'BEAM0101/land_cover_data'
ds = xr.open_dataset(gedi_file_path, engine='h5netcdf', **{'group': group, 'phony_dims': 'sort'})

The problem here is that I care most about some of the layers that are not stored in groups, e.g. rh100 in the following example, which is the height above ground of the received waveform signal start:

/
 ├── BEAM0101
 │   ├── algorithmrun_flag (800,) uint8
 │   ├── beam (800,) uint16
 │   ├── channel (800,) uint8
 │   ├── cover (800,) float32
 │   ├── cover_z (800, 30) float32
 │   ├── rh100 (800,) int16
 │   ├── land_cover_data
 │   │   ├── landsat_treecover (800,) float64
 │   │   ├── landsat_water_persistence (800,) uint8
 │   │   ├── leaf_off_doy (800,) int16
 │   │   ├── leaf_off_flag (800,) uint8
 │   │   ├── leaf_on_cycle (800,) uint8
 |   |   ...
 ...  ...

Regarding Zarr, I’ve tried to migrate a file as described here: Tutorial — zarr 2.13.3 documentation
Unfortunately that ends in TypeError: Object of type bytes_ is not JSON serializable.

Any advice and shared experiences are more than welcome! :slight_smile:

I never worked with that particular dataset, but I think you might be interested in the datatree library. With that, you can open all groups at once and access them in a tree-like structure where each node “wraps” a dataset (the only caveat is that the library is still pretty new).

cc @TomNicholas

Code to open the dataset and access subgroups
In [1]: import datatree

In [2]: tree = datatree.open_datatree("/tmp/processed_GEDI02_B_2020119111517_O07799_04_T03552_02_003_01_V002.h5", engine="h5netcdf", phony_dims="sort")
   ...: tree
Out[2]: 
DataTree('None', parent=None)
│   Dimensions:  ()
│   Data variables:
│       *empty*
│   Attributes:
│       Processing Parameters:  This file was gernerated by the ICESat-2 Service ...
│       short_name:             GEDI_L2B
├── DataTree('BEAM0101')
│   │   Dimensions:                 (phony_dim_0: 2281, phony_dim_1: 30,
│   │                                phony_dim_2: 101818)
│   │   Dimensions without coordinates: phony_dim_0, phony_dim_1, phony_dim_2
│   │   Data variables: (12/38)
│   │       algorithmrun_flag       (phony_dim_0) uint8 ...
│   │       beam                    (phony_dim_0) uint16 ...
│   │       channel                 (phony_dim_0) uint8 ...
│   │       cover                   (phony_dim_0) float32 ...
│   │       cover_z                 (phony_dim_0, phony_dim_1) float32 ...
│   │       delta_time              (phony_dim_0) datetime64[ns] ...
│   │       ...                      ...
│   │       selected_mode_flag      (phony_dim_0) uint8 ...
│   │       selected_rg_algorithm   (phony_dim_0) uint8 ...
│   │       sensitivity             (phony_dim_0) float32 ...
│   │       shot_number             (phony_dim_0) uint64 ...
│   │       stale_return_flag       (phony_dim_0) uint8 ...
│   │       surface_flag            (phony_dim_0) uint8 ...
│   │   Attributes:
│   │       description:        Full power beam
│   │       wp-l2-l2b_githash:  229d7ee52e17b675fc9ca7b5d95ffa12a6f17055
│   │       wp-l2-l2b_version:  20200110.0.0
│   ├── DataTree('ancillary')
│   │       Dimensions:                         (phony_dim_3: 1)
│   │       Dimensions without coordinates: phony_dim_3
│   │       Data variables:
│   │           dz                              (phony_dim_3) float64 ...
│   │           l2a_alg_count                   (phony_dim_3) int64 ...
│   │           maxheight_cuttoff               (phony_dim_3) float64 ...
│   │           rg_eg_constraint_center_buffer  (phony_dim_3) int32 ...
│   │           rg_eg_mpfit_max_func_evals      (phony_dim_3) uint16 ...
│   │           rg_eg_mpfit_maxiters            (phony_dim_3) uint16 ...
│   │           rg_eg_mpfit_tolerance           (phony_dim_3) float64 ...
│   │           signal_search_buff              (phony_dim_3) float64 ...
│   │           tx_noise_stddev_multiplier      (phony_dim_3) float64 ...
│   ├── DataTree('geolocation')
│   │       Dimensions:                  (phony_dim_4: 2281)
│   │       Coordinates:
│   │           delta_time               (phony_dim_4) datetime64[ns] ...
│   │           lat_highestreturn        (phony_dim_4) float64 ...
│   │           lat_lowestmode           (phony_dim_4) float64 ...
│   │           latitude_bin0            (phony_dim_4) float64 ...
│   │           latitude_lastbin         (phony_dim_4) float64 ...
│   │           lon_highestreturn        (phony_dim_4) float64 ...
│   │           lon_lowestmode           (phony_dim_4) float64 ...
│   │           longitude_bin0           (phony_dim_4) float64 ...
│   │           longitude_lastbin        (phony_dim_4) float64 ...
│   │       Dimensions without coordinates: phony_dim_4
│   │       Data variables: (12/19)
│   │           degrade_flag             (phony_dim_4) int8 ...
│   │           digital_elevation_model  (phony_dim_4) float32 ...
│   │           elev_highestreturn       (phony_dim_4) float32 ...
│   │           elev_lowestmode          (phony_dim_4) float32 ...
│   │           elevation_bin0           (phony_dim_4) float64 ...
│   │           elevation_bin0_error     (phony_dim_4) float32 ...
│   │           ...                       ...
│   │           local_beam_elevation     (phony_dim_4) float32 ...
│   │           longitude_bin0_error     (phony_dim_4) float32 ...
│   │           longitude_lastbin_error  (phony_dim_4) float32 ...
│   │           shot_number              (phony_dim_4) uint64 ...
│   │           solar_azimuth            (phony_dim_4) float32 ...
│   │           solar_elevation          (phony_dim_4) float32 ...
│   ├── DataTree('land_cover_data')
│   │       Dimensions:                    (phony_dim_5: 2281)
│   │       Dimensions without coordinates: phony_dim_5
│   │       Data variables: (12/14)
│   │           landsat_treecover          (phony_dim_5) float64 ...
│   │           landsat_water_persistence  (phony_dim_5) uint8 ...
│   │           leaf_off_doy               (phony_dim_5) float32 ...
│   │           leaf_off_flag              (phony_dim_5) uint8 ...
│   │           leaf_on_cycle              (phony_dim_5) float32 ...
│   │           leaf_on_doy                (phony_dim_5) float32 ...
│   │           ...                         ...
│   │           modis_treecover            (phony_dim_5) float64 ...
│   │           modis_treecover_sd         (phony_dim_5) float64 ...
│   │           pft_class                  (phony_dim_5) uint8 ...
│   │           region_class               (phony_dim_5) uint8 ...
│   │           urban_focal_window_size    (phony_dim_5) uint8 ...
│   │           urban_proportion           (phony_dim_5) uint8 ...
│   └── DataTree('rx_processing')
│           Dimensions:                   (phony_dim_6: 2281)
│           Dimensions without coordinates: phony_dim_6
│           Data variables: (12/109)
│               algorithmrun_flag_a1      (phony_dim_6) uint8 ...
│               algorithmrun_flag_a2      (phony_dim_6) uint8 ...
│               algorithmrun_flag_a3      (phony_dim_6) uint8 ...
│               algorithmrun_flag_a4      (phony_dim_6) uint8 ...
│               algorithmrun_flag_a5      (phony_dim_6) uint8 ...
│               algorithmrun_flag_a6      (phony_dim_6) uint8 ...
│               ...                        ...
│               rx_energy_a2              (phony_dim_6) float32 ...
│               rx_energy_a3              (phony_dim_6) float32 ...
│               rx_energy_a4              (phony_dim_6) float32 ...
│               rx_energy_a5              (phony_dim_6) float32 ...
│               rx_energy_a6              (phony_dim_6) float32 ...
│               shot_number               (phony_dim_6) uint64 ...
├── DataTree('BEAM0110')
│   │   Dimensions:                 (phony_dim_7: 2264, phony_dim_8: 30,
│   │                                phony_dim_9: 94358)
│   │   Dimensions without coordinates: phony_dim_7, phony_dim_8, phony_dim_9
│   │   Data variables: (12/38)
│   │       algorithmrun_flag       (phony_dim_7) uint8 ...
│   │       beam                    (phony_dim_7) uint16 ...
│   │       channel                 (phony_dim_7) uint8 ...
│   │       cover                   (phony_dim_7) float32 ...
│   │       cover_z                 (phony_dim_7, phony_dim_8) float32 ...
│   │       delta_time              (phony_dim_7) datetime64[ns] ...
│   │       ...                      ...
│   │       selected_mode_flag      (phony_dim_7) uint8 ...
│   │       selected_rg_algorithm   (phony_dim_7) uint8 ...
│   │       sensitivity             (phony_dim_7) float32 ...
│   │       shot_number             (phony_dim_7) uint64 ...
│   │       stale_return_flag       (phony_dim_7) uint8 ...
│   │       surface_flag            (phony_dim_7) uint8 ...
│   │   Attributes:
│   │       description:        Full power beam
│   │       wp-l2-l2b_githash:  229d7ee52e17b675fc9ca7b5d95ffa12a6f17055
│   │       wp-l2-l2b_version:  20200110.0.0
│   ├── DataTree('ancillary')
│   │       Dimensions:                         (phony_dim_10: 1)
│   │       Dimensions without coordinates: phony_dim_10
│   │       Data variables:
│   │           dz                              (phony_dim_10) float64 ...
│   │           l2a_alg_count                   (phony_dim_10) int64 ...
│   │           maxheight_cuttoff               (phony_dim_10) float64 ...
│   │           rg_eg_constraint_center_buffer  (phony_dim_10) int32 ...
│   │           rg_eg_mpfit_max_func_evals      (phony_dim_10) uint16 ...
│   │           rg_eg_mpfit_maxiters            (phony_dim_10) uint16 ...
│   │           rg_eg_mpfit_tolerance           (phony_dim_10) float64 ...
│   │           signal_search_buff              (phony_dim_10) float64 ...
│   │           tx_noise_stddev_multiplier      (phony_dim_10) float64 ...
│   ├── DataTree('geolocation')
│   │       Dimensions:                  (phony_dim_11: 2264)
│   │       Coordinates:
│   │           delta_time               (phony_dim_11) datetime64[ns] ...
│   │           lat_highestreturn        (phony_dim_11) float64 ...
│   │           lat_lowestmode           (phony_dim_11) float64 ...
│   │           latitude_bin0            (phony_dim_11) float64 ...
│   │           latitude_lastbin         (phony_dim_11) float64 ...
│   │           lon_highestreturn        (phony_dim_11) float64 ...
│   │           lon_lowestmode           (phony_dim_11) float64 ...
│   │           longitude_bin0           (phony_dim_11) float64 ...
│   │           longitude_lastbin        (phony_dim_11) float64 ...
│   │       Dimensions without coordinates: phony_dim_11
│   │       Data variables: (12/19)
│   │           degrade_flag             (phony_dim_11) int8 ...
│   │           digital_elevation_model  (phony_dim_11) float32 ...
│   │           elev_highestreturn       (phony_dim_11) float32 ...
│   │           elev_lowestmode          (phony_dim_11) float32 ...
│   │           elevation_bin0           (phony_dim_11) float64 ...
│   │           elevation_bin0_error     (phony_dim_11) float32 ...
│   │           ...                       ...
│   │           local_beam_elevation     (phony_dim_11) float32 ...
│   │           longitude_bin0_error     (phony_dim_11) float32 ...
│   │           longitude_lastbin_error  (phony_dim_11) float32 ...
│   │           shot_number              (phony_dim_11) uint64 ...
│   │           solar_azimuth            (phony_dim_11) float32 ...
│   │           solar_elevation          (phony_dim_11) float32 ...
│   ├── DataTree('land_cover_data')
│   │       Dimensions:                    (phony_dim_12: 2264)
│   │       Dimensions without coordinates: phony_dim_12
│   │       Data variables: (12/14)
│   │           landsat_treecover          (phony_dim_12) float64 ...
│   │           landsat_water_persistence  (phony_dim_12) uint8 ...
│   │           leaf_off_doy               (phony_dim_12) float32 ...
│   │           leaf_off_flag              (phony_dim_12) uint8 ...
│   │           leaf_on_cycle              (phony_dim_12) float32 ...
│   │           leaf_on_doy                (phony_dim_12) float32 ...
│   │           ...                         ...
│   │           modis_treecover            (phony_dim_12) float64 ...
│   │           modis_treecover_sd         (phony_dim_12) float64 ...
│   │           pft_class                  (phony_dim_12) uint8 ...
│   │           region_class               (phony_dim_12) uint8 ...
│   │           urban_focal_window_size    (phony_dim_12) uint8 ...
│   │           urban_proportion           (phony_dim_12) uint8 ...
│   └── DataTree('rx_processing')
│           Dimensions:                   (phony_dim_13: 2264)
│           Dimensions without coordinates: phony_dim_13
│           Data variables: (12/109)
│               algorithmrun_flag_a1      (phony_dim_13) uint8 ...
│               algorithmrun_flag_a2      (phony_dim_13) uint8 ...
│               algorithmrun_flag_a3      (phony_dim_13) uint8 ...
│               algorithmrun_flag_a4      (phony_dim_13) uint8 ...
│               algorithmrun_flag_a5      (phony_dim_13) uint8 ...
│               algorithmrun_flag_a6      (phony_dim_13) uint8 ...
│               ...                        ...
│               rx_energy_a2              (phony_dim_13) float32 ...
│               rx_energy_a3              (phony_dim_13) float32 ...
│               rx_energy_a4              (phony_dim_13) float32 ...
│               rx_energy_a5              (phony_dim_13) float32 ...
│               rx_energy_a6              (phony_dim_13) float32 ...
│               shot_number               (phony_dim_13) uint64 ...
├── DataTree('BEAM1000')
│   │   Dimensions:                 (phony_dim_14: 2222, phony_dim_15: 30,
│   │                                phony_dim_16: 65904)
│   │   Dimensions without coordinates: phony_dim_14, phony_dim_15, phony_dim_16
│   │   Data variables: (12/38)
│   │       algorithmrun_flag       (phony_dim_14) uint8 ...
│   │       beam                    (phony_dim_14) uint16 ...
│   │       channel                 (phony_dim_14) uint8 ...
│   │       cover                   (phony_dim_14) float32 ...
│   │       cover_z                 (phony_dim_14, phony_dim_15) float32 ...
│   │       delta_time              (phony_dim_14) datetime64[ns] ...
│   │       ...                      ...
│   │       selected_mode_flag      (phony_dim_14) uint8 ...
│   │       selected_rg_algorithm   (phony_dim_14) uint8 ...
│   │       sensitivity             (phony_dim_14) float32 ...
│   │       shot_number             (phony_dim_14) uint64 ...
│   │       stale_return_flag       (phony_dim_14) uint8 ...
│   │       surface_flag            (phony_dim_14) uint8 ...
│   │   Attributes:
│   │       description:        Full power beam
│   │       wp-l2-l2b_githash:  229d7ee52e17b675fc9ca7b5d95ffa12a6f17055
│   │       wp-l2-l2b_version:  20200110.0.0
│   ├── DataTree('ancillary')
│   │       Dimensions:                         (phony_dim_17: 1)
│   │       Dimensions without coordinates: phony_dim_17
│   │       Data variables:
│   │           dz                              (phony_dim_17) float64 ...
│   │           l2a_alg_count                   (phony_dim_17) int64 ...
│   │           maxheight_cuttoff               (phony_dim_17) float64 ...
│   │           rg_eg_constraint_center_buffer  (phony_dim_17) int32 ...
│   │           rg_eg_mpfit_max_func_evals      (phony_dim_17) uint16 ...
│   │           rg_eg_mpfit_maxiters            (phony_dim_17) uint16 ...
│   │           rg_eg_mpfit_tolerance           (phony_dim_17) float64 ...
│   │           signal_search_buff              (phony_dim_17) float64 ...
│   │           tx_noise_stddev_multiplier      (phony_dim_17) float64 ...
│   ├── DataTree('geolocation')
│   │       Dimensions:                  (phony_dim_18: 2222)
│   │       Coordinates:
│   │           delta_time               (phony_dim_18) datetime64[ns] ...
│   │           lat_highestreturn        (phony_dim_18) float64 ...
│   │           lat_lowestmode           (phony_dim_18) float64 ...
│   │           latitude_bin0            (phony_dim_18) float64 ...
│   │           latitude_lastbin         (phony_dim_18) float64 ...
│   │           lon_highestreturn        (phony_dim_18) float64 ...
│   │           lon_lowestmode           (phony_dim_18) float64 ...
│   │           longitude_bin0           (phony_dim_18) float64 ...
│   │           longitude_lastbin        (phony_dim_18) float64 ...
│   │       Dimensions without coordinates: phony_dim_18
│   │       Data variables: (12/19)
│   │           degrade_flag             (phony_dim_18) int8 ...
│   │           digital_elevation_model  (phony_dim_18) float32 ...
│   │           elev_highestreturn       (phony_dim_18) float32 ...
│   │           elev_lowestmode          (phony_dim_18) float32 ...
│   │           elevation_bin0           (phony_dim_18) float64 ...
│   │           elevation_bin0_error     (phony_dim_18) float32 ...
│   │           ...                       ...
│   │           local_beam_elevation     (phony_dim_18) float32 ...
│   │           longitude_bin0_error     (phony_dim_18) float32 ...
│   │           longitude_lastbin_error  (phony_dim_18) float32 ...
│   │           shot_number              (phony_dim_18) uint64 ...
│   │           solar_azimuth            (phony_dim_18) float32 ...
│   │           solar_elevation          (phony_dim_18) float32 ...
│   ├── DataTree('land_cover_data')
│   │       Dimensions:                    (phony_dim_19: 2222)
│   │       Dimensions without coordinates: phony_dim_19
│   │       Data variables: (12/14)
│   │           landsat_treecover          (phony_dim_19) float64 ...
│   │           landsat_water_persistence  (phony_dim_19) uint8 ...
│   │           leaf_off_doy               (phony_dim_19) float32 ...
│   │           leaf_off_flag              (phony_dim_19) uint8 ...
│   │           leaf_on_cycle              (phony_dim_19) float32 ...
│   │           leaf_on_doy                (phony_dim_19) float32 ...
│   │           ...                         ...
│   │           modis_treecover            (phony_dim_19) float64 ...
│   │           modis_treecover_sd         (phony_dim_19) float64 ...
│   │           pft_class                  (phony_dim_19) uint8 ...
│   │           region_class               (phony_dim_19) uint8 ...
│   │           urban_focal_window_size    (phony_dim_19) uint8 ...
│   │           urban_proportion           (phony_dim_19) uint8 ...
│   └── DataTree('rx_processing')
│           Dimensions:                   (phony_dim_20: 2222)
│           Dimensions without coordinates: phony_dim_20
│           Data variables: (12/109)
│               algorithmrun_flag_a1      (phony_dim_20) uint8 ...
│               algorithmrun_flag_a2      (phony_dim_20) uint8 ...
│               algorithmrun_flag_a3      (phony_dim_20) uint8 ...
│               algorithmrun_flag_a4      (phony_dim_20) uint8 ...
│               algorithmrun_flag_a5      (phony_dim_20) uint8 ...
│               algorithmrun_flag_a6      (phony_dim_20) uint8 ...
│               ...                        ...
│               rx_energy_a2              (phony_dim_20) float32 ...
│               rx_energy_a3              (phony_dim_20) float32 ...
│               rx_energy_a4              (phony_dim_20) float32 ...
│               rx_energy_a5              (phony_dim_20) float32 ...
│               rx_energy_a6              (phony_dim_20) float32 ...
│               shot_number               (phony_dim_20) uint64 ...
├── DataTree('BEAM1011')
│   │   Dimensions:                 (phony_dim_21: 2208, phony_dim_22: 30,
│   │                                phony_dim_23: 75669)
│   │   Dimensions without coordinates: phony_dim_21, phony_dim_22, phony_dim_23
│   │   Data variables: (12/38)
│   │       algorithmrun_flag       (phony_dim_21) uint8 ...
│   │       beam                    (phony_dim_21) uint16 ...
│   │       channel                 (phony_dim_21) uint8 ...
│   │       cover                   (phony_dim_21) float32 ...
│   │       cover_z                 (phony_dim_21, phony_dim_22) float32 ...
│   │       delta_time              (phony_dim_21) datetime64[ns] ...
│   │       ...                      ...
│   │       selected_mode_flag      (phony_dim_21) uint8 ...
│   │       selected_rg_algorithm   (phony_dim_21) uint8 ...
│   │       sensitivity             (phony_dim_21) float32 ...
│   │       shot_number             (phony_dim_21) uint64 ...
│   │       stale_return_flag       (phony_dim_21) uint8 ...
│   │       surface_flag            (phony_dim_21) uint8 ...
│   │   Attributes:
│   │       description:        Full power beam
│   │       wp-l2-l2b_githash:  229d7ee52e17b675fc9ca7b5d95ffa12a6f17055
│   │       wp-l2-l2b_version:  20200110.0.0
│   ├── DataTree('ancillary')
│   │       Dimensions:                         (phony_dim_24: 1)
│   │       Dimensions without coordinates: phony_dim_24
│   │       Data variables:
│   │           dz                              (phony_dim_24) float64 ...
│   │           l2a_alg_count                   (phony_dim_24) int64 ...
│   │           maxheight_cuttoff               (phony_dim_24) float64 ...
│   │           rg_eg_constraint_center_buffer  (phony_dim_24) int32 ...
│   │           rg_eg_mpfit_max_func_evals      (phony_dim_24) uint16 ...
│   │           rg_eg_mpfit_maxiters            (phony_dim_24) uint16 ...
│   │           rg_eg_mpfit_tolerance           (phony_dim_24) float64 ...
│   │           signal_search_buff              (phony_dim_24) float64 ...
│   │           tx_noise_stddev_multiplier      (phony_dim_24) float64 ...
│   ├── DataTree('geolocation')
│   │       Dimensions:                  (phony_dim_25: 2208)
│   │       Coordinates:
│   │           delta_time               (phony_dim_25) datetime64[ns] ...
│   │           lat_highestreturn        (phony_dim_25) float64 ...
│   │           lat_lowestmode           (phony_dim_25) float64 ...
│   │           latitude_bin0            (phony_dim_25) float64 ...
│   │           latitude_lastbin         (phony_dim_25) float64 ...
│   │           lon_highestreturn        (phony_dim_25) float64 ...
│   │           lon_lowestmode           (phony_dim_25) float64 ...
│   │           longitude_bin0           (phony_dim_25) float64 ...
│   │           longitude_lastbin        (phony_dim_25) float64 ...
│   │       Dimensions without coordinates: phony_dim_25
│   │       Data variables: (12/19)
│   │           degrade_flag             (phony_dim_25) int8 ...
│   │           digital_elevation_model  (phony_dim_25) float32 ...
│   │           elev_highestreturn       (phony_dim_25) float32 ...
│   │           elev_lowestmode          (phony_dim_25) float32 ...
│   │           elevation_bin0           (phony_dim_25) float64 ...
│   │           elevation_bin0_error     (phony_dim_25) float32 ...
│   │           ...                       ...
│   │           local_beam_elevation     (phony_dim_25) float32 ...
│   │           longitude_bin0_error     (phony_dim_25) float32 ...
│   │           longitude_lastbin_error  (phony_dim_25) float32 ...
│   │           shot_number              (phony_dim_25) uint64 ...
│   │           solar_azimuth            (phony_dim_25) float32 ...
│   │           solar_elevation          (phony_dim_25) float32 ...
│   ├── DataTree('land_cover_data')
│   │       Dimensions:                    (phony_dim_26: 2208)
│   │       Dimensions without coordinates: phony_dim_26
│   │       Data variables: (12/14)
│   │           landsat_treecover          (phony_dim_26) float64 ...
│   │           landsat_water_persistence  (phony_dim_26) uint8 ...
│   │           leaf_off_doy               (phony_dim_26) float32 ...
│   │           leaf_off_flag              (phony_dim_26) uint8 ...
│   │           leaf_on_cycle              (phony_dim_26) float32 ...
│   │           leaf_on_doy                (phony_dim_26) float32 ...
│   │           ...                         ...
│   │           modis_treecover            (phony_dim_26) float64 ...
│   │           modis_treecover_sd         (phony_dim_26) float64 ...
│   │           pft_class                  (phony_dim_26) uint8 ...
│   │           region_class               (phony_dim_26) uint8 ...
│   │           urban_focal_window_size    (phony_dim_26) uint8 ...
│   │           urban_proportion           (phony_dim_26) uint8 ...
│   └── DataTree('rx_processing')
│           Dimensions:                   (phony_dim_27: 2208)
│           Dimensions without coordinates: phony_dim_27
│           Data variables: (12/109)
│               algorithmrun_flag_a1      (phony_dim_27) uint8 ...
│               algorithmrun_flag_a2      (phony_dim_27) uint8 ...
│               algorithmrun_flag_a3      (phony_dim_27) uint8 ...
│               algorithmrun_flag_a4      (phony_dim_27) uint8 ...
│               algorithmrun_flag_a5      (phony_dim_27) uint8 ...
│               algorithmrun_flag_a6      (phony_dim_27) uint8 ...
│               ...                        ...
│               rx_energy_a2              (phony_dim_27) float32 ...
│               rx_energy_a3              (phony_dim_27) float32 ...
│               rx_energy_a4              (phony_dim_27) float32 ...
│               rx_energy_a5              (phony_dim_27) float32 ...
│               rx_energy_a6              (phony_dim_27) float32 ...
│               shot_number               (phony_dim_27) uint64 ...
└── DataTree('METADATA')
    └── DataTree('DatasetIdentification')
            Dimensions:  ()
            Data variables:
                *empty*
            Attributes: (12/15)
                PGEVersion:                  003
                VersionID:                   01
                abstract:                    The GEDI L2B standard data product contains ...
                characterSet:                utf8
                creationDate:                2021-05-11T00:56:02.950287Z
                credit:                      The software that generates the L2B product ...
                ...                          ...
                purpose:                     The purpose of the L2B dataset is to extract...
                shortName:                   GEDI_L2B
                spatialRepresentationType:   along-track
                status:                      onGoing
                topicCategory:               geoscientificInformation
                uuid:                        96969ba7-7af0-49d1-be0a-00885baec6

In [3]: tree["BEAM0101/land_cover_data"]
Out[3]: 
DataTree('land_cover_data', parent="BEAM0101")
    Dimensions:                    (phony_dim_2: 2281)
    Dimensions without coordinates: phony_dim_2
    Data variables: (12/14)
        landsat_treecover          (phony_dim_2) float64 ...
        landsat_water_persistence  (phony_dim_2) uint8 ...
        leaf_off_doy               (phony_dim_2) float32 ...
        leaf_off_flag              (phony_dim_2) uint8 ...
        leaf_on_cycle              (phony_dim_2) float32 ...
        leaf_on_doy                (phony_dim_2) float32 ...
        ...                         ...
        modis_treecover            (phony_dim_2) float64 ...
        modis_treecover_sd         (phony_dim_2) float64 ...
        pft_class                  (phony_dim_2) uint8 ...
        region_class               (phony_dim_2) uint8 ...
        urban_focal_window_size    (phony_dim_2) uint8 ...
        urban_proportion           (phony_dim_2) uint8 ...
1 Like

Thanks a lot @keewis !
This seems to work pretty well. Even better than another option I found earlier here:

import arviz as az

datagroup = az.from_netcdf(filename=gedi_file_path)
datagroup

This exposes all layers that are not stored in groups.

I would second the use of datatree. The GEDI data structure should be quite similar to ICESat-2 (since they’re both laser altimeter instruments). Check out this comment at Reader.load() issues with ATL06 - #2 by weiji14 on using datatree.open_datatree.

Oh, and if you’ve got some working code on getting GEDI data from HDF5 to Zarr, I would love to check it out! There’s some discussion from the NASA Science team on getting a cloud-optimized (e.g. Zarr or other) format for ICESat-2 (see GitHub - CryoInTheCloud/IS2CloudOptimizedData: For work to build a cloud-optimized data format standard for NASA ICESat-2 data, nothing much there yet though), and I’m wondering if GEDI might be close enough as a point cloud data structure to get involved in some of the discussion.

2 Likes

Kerchunk is a possibility too I would think.

e.g. for more esoteric hdf5 I did this Geoh5py reading in s3 · Issue #138 · fsspec/kerchunk · GitHub

1 Like