What's Next - Cloud - Partner-managed Infrastructure

mrocklin · December 27, 2023, 4:24pm

A few notes on this from a Coiled perspective:

Most users don’t pay us any money. The free tier is more than enough for most users
Even if they do go past the free tier it’s easy to pay with credits through the marketplace
Coiled can happily launch notebooks for you, just like JupyterHub. The only difference is that we store files and Dev state on your machine rather than a cloud ebs volume. Some folks prefer this, some don’t. We certainly have customers who migrated from JupyterHub to Coiled and are very happy with the move.

rsignell · December 28, 2023, 12:15am

@mrocklin, I did not know that you could pay for Coiled with credits – that’s awesome!

Does Coiled run on Azure? If not, does it have plans to? (I ask because I’m working with NATO researchers and currently they are required to only use Azure)

mrocklin · December 28, 2023, 12:56am

The team is working on Azure now. Last I heard we had something working internally and are aiming for something to use with early partners in late January.

Probably it won’t have all the bells and whistles, but my guess is that it’ll still be a cut above any other option in terms of UX for easy and cheap scalability. I’d be curious to learn more about NATO’s needs if y’all have the time. That’d be a cool user to have

nishadhka · March 13, 2024, 2:44pm

I would like to provide a user perspective on the discussion surrounding
proprietary/payment-based cloud infrastructure for the Pangeo ecosystem, which
consists of Free and Open-Source Software (FOSS). This perspective stems from
the experience of testing on Impact-Based Forecasting (IBF) by a team using the
Ensemble Prediction System GEFS NOAA Global Ensemble Forecast System (GEFS) - Registry of Open Data on AWS. The
testing was conducted to operationalize IBF at the Disaster Operations Centre,
DRM program, IGAD Climate Prediction and Applications Centre, Kenya.

The method involved using Kerchunk and Coiled to generate metadata JSON files,
followed by creating Zarr datasets from GEFS every 6 hours of operational
output. Coiled was utilized to create Dask clusters, reading approximately 2400
GRIB files (30 member forecasts) every 6 hours. Subsequently, the workflow
included stamp map creation, IMPROVER
based post-processing, and probability map creation, with plots merged into
PDF format (using LaTeX) for internal usage to understand the risk.

The workflows relied on Coiled atop AWS and utilized Dask clusters and
functions to address computing needs. The JSON metadata created by Kerchunk
outputs are stored in S3 buckets, and downstream workflows read this metadata
into Zarr to create plots and post-processed outputs. This workflow
significantly reduces data storage requirements paratment from the major computational improvement,
from 10GB per day with non-Kerchunk usage to just a few MB with Kerchunk.

Running this workflow revealed about DevOps architecture such as data API and Web Processing
Service (WPS) and how they are crucial for sustaining
and utilizing this workflow in operational disaster monitoring centers.
In the architecute Coiled acts as a WPS which avoids the need of in-house development
may be using infrastructure-as-code approaches like Terraform, a long development cycle and high customization,
ensures sustainability by simplifying onboarding with improved documentation
and third-party support, largely reducing capacity development requirements.

The Coiled acts as simplifier of architecture for the WPS requirement of DevOps for IBF,
aligning with projects like Xclim.
Regarding data API, while the current method
involves storing ARCO formats in-house, the role of Arraylake would meet this
requirement. ARCO data format is crucial for operational IBF to conduct risk
assessment and evaluation, combining climate/weather data with GIS data to provide
actionable insights to decision-makers. A searchable, documented data catalog
regularly updated is vital for the IBF workflow, with advancements in
ARCO-related GIS datasets such as COOP a tangeible workflow integrating climate/wather data and GIS dataset can be acheived.

In summary pangeo ecosystem and associated proprietary/payment-based cloud infrastructure
has major role to play in setting up the workflow related to climate/weather data workflow and impact modeling (IBF).

Topic		Replies	Views
What's Next - Cloud - Pangeo-managed Infrastructure	6	473	December 22, 2023
What's Next - Cloud	0	324	December 7, 2023
Partnering with Cloud Providers Cloud	1	668	October 3, 2019
Proposal: Open source guidebook for scientific cloud computing infrastructure Cloud	3	180	March 13, 2025
Has Pangeo gotten NSF feedback to use pre-existing infrastructure? Meta	1	632	October 29, 2019

What's Next - Cloud - Partner-managed Infrastructure

Related topics