Pangeo Showcase: "Ensemble Weather Forecast Data Streaming with Kerchunk / VirtualiZarr" (November 12, 2025 at 12 PM ET)

Title: “Ensemble Weather Forecast Data Streaming with Kerchunk / VirtualiZarr”
Invited Speaker: Nishadh Kalladath

When: Wednesday, November 12, 2025 at 12 PM EST (2025-11-12T17:00:00Z)
Where: Launch Meeting - Zoom
Abstract:

The role of Ensemble Prediction Systems (EPS) in risk assessment and impact-based forecasting is central to modern early warning systems. By combining extreme value analysis on long-term observations to define hazard thresholds and assessing whether ensemble forecast probabilities exceed them, decision-makers can continuously monitor evolving risks in near real time. Under the CRAF’D-funded initiative at ICPAC, Kenya, this work advances the use of cloud-optimized datasets for forecast-to-impact assessments, emphasizing the need for efficient data streaming and scalable analysis.

Kerchunk and VirtualiZarr enable seamless, on-demand access to multi-year GRIB datasets stored in cloud object storage such as AWS S3 open data registry, transforming them into virtual Zarr datasets without duplication or download. This approach is critical for regional services in East Africa that depend on timely access to GEFS and ECMWF forecast data for anticipatory action.

This presentation showcases methods to parse GEFS and ECMWF datasets in AWS S3, utilize GRIB index files to reduce the need for full GRIB scans, and address challenges in transitioning from Zarr version 2 to 3—particularly around metadata and codec compatibility in large-scale ensemble data processing. Insights gained from the Zarr Summit were instrumental in adopting version 3 features and developing a forecast-specific GRIB parser in Kerchunk for integration with VirtualiZarr.

Agenda:

  • ~15 minutes - Showcase presentation
  • 10 - 30 minutes - Discussion
  • 15 - 30 minutes - Community check-in