Pangeo Showcase: "Optimizations for Kerchunk aggregation and Zarr I/O at scale for Machine Learning"

DOI

Title: “Optimizations for Kerchunk aggregation and Zarr I/O at scale for Machine Learning”
Invited Speaker: David Stuebe(ORCID:0009-0000-2804-7191), Camus Energy
When: Wednesday March 6, 12PM EST
Where: Launch Meeting - Zoom
Abstract: We have recently contributed enhancements that make working with NODD GRIB weather forecasts more efficient at scale. By sharing this work with the Pangeo community we hope that folks will both find benefit and help advocate for these enhancements to be enabled in a more generalized way.

  • 20 minutes - Community Showcase
  • 40 minutes - Showcase discussion/Community check-ins
3 Likes

Forgot to mention during that talk that one of the key advantages of the parallel_chunk_getitems implementation is that it is fault tolerant, returning the fill value if there is a corrupted grib file.

2 Likes

A version of @emfdavid’s approach is now documented in the Kerchunk docs: Aggregation special cases — kerchunk documentation

(I can’t take any credit! I didn’t work on this documentation! I’m just linking it for anyone who stumbles on this Pangeo thread in the future…)

3 Likes