Hi everyone,
I’ve been working with ICESat-2 datasets (particularly ATL20) for about a year now, and recently I’ve been thinking more broadly about the end-to-end workflow involved in working with these data, from discovery and access to preprocessing and analysis.
I’m curious to hear from others who regularly use ICESat-2 products:
- Which parts of the workflow do you find most time-consuming?
- What are the biggest challenges or frustrations you encounter?
- Are there tasks that you find yourself repeatedly doing across different projects?
I’m interested in understanding how different researchers approach these datasets and where the major bottlenecks tend to arise.
Looking forward to hearing your thoughts. Thanks!
Hey Ram,
I’m not a regular ICESat-2 user, but did some work with Shane Grigsby on aggregations of point observations to DGGS using ATL-06 as a case study and can share a bit about the challenges that I ran into. I found that lock contention in h5coro led to poor performance when working with many files and that untuned-use led to memory bloating due to open file handles.
I think an async interface to HDF5 files could help a lot. I did a brief amount of experimentation in GitHub - virtual-zarr/async-hdf5: Experimenting with a Zarr proxy for HDF5, backed by Rust · GitHub but ran out of time to go farther. I’ve also been wondering for a while how fast and easy access could be if using virtual zarr and zarr-datafusion-search but similarly haven’t found time to experiment.
Cheers,
Max