I would like to provide a user perspective on the discussion surrounding
proprietary/payment-based cloud infrastructure for the Pangeo ecosystem, which
consists of Free and Open-Source Software (FOSS). This perspective stems from
the experience of testing on Impact-Based Forecasting (IBF) by a team using the
Ensemble Prediction System GEFS NOAA Global Ensemble Forecast System (GEFS) - Registry of Open Data on AWS. The
testing was conducted to operationalize IBF at the Disaster Operations Centre,
DRM program, IGAD Climate Prediction and Applications Centre, Kenya.
The method involved using Kerchunk and Coiled to generate metadata JSON files,
followed by creating Zarr datasets from GEFS every 6 hours of operational
output. Coiled was utilized to create Dask clusters, reading approximately 2400
GRIB files (30 member forecasts) every 6 hours. Subsequently, the workflow
included stamp map creation, IMPROVER
based post-processing, and probability map creation, with plots merged into
PDF format (using LaTeX) for internal usage to understand the risk.
The workflows relied on Coiled atop AWS and utilized Dask clusters and
functions to address computing needs. The JSON metadata created by Kerchunk
outputs are stored in S3 buckets, and downstream workflows read this metadata
into Zarr to create plots and post-processed outputs. This workflow
significantly reduces data storage requirements paratment from the major computational improvement,
from 10GB per day with non-Kerchunk usage to just a few MB with Kerchunk.
Running this workflow revealed about DevOps architecture such as data API and Web Processing
Service (WPS) and how they are crucial for sustaining
and utilizing this workflow in operational disaster monitoring centers.
In the architecute Coiled acts as a WPS which avoids the need of in-house development
may be using infrastructure-as-code approaches like Terraform, a long development cycle and high customization,
ensures sustainability by simplifying onboarding with improved documentation
and third-party support, largely reducing capacity development requirements.
The Coiled acts as simplifier of architecture for the WPS requirement of DevOps for IBF,
aligning with projects like Xclim.
Regarding data API, while the current method
involves storing ARCO formats in-house, the role of Arraylake would meet this
requirement. ARCO data format is crucial for operational IBF to conduct risk
assessment and evaluation, combining climate/weather data with GIS data to provide
actionable insights to decision-makers. A searchable, documented data catalog
regularly updated is vital for the IBF workflow, with advancements in
ARCO-related GIS datasets such as COOP a tangeible workflow integrating climate/wather data and GIS dataset can be acheived.
In summary pangeo ecosystem and associated proprietary/payment-based cloud infrastructure
has major role to play in setting up the workflow related to climate/weather data workflow and impact modeling (IBF).