Pangeo Showcase: "How to transform thousands of CMIP6 datasets to zarr with Pangeo Forge - And why we should never do this again!"

rsignell · November 26, 2023, 5:22pm

Title: “How to transform thousands of CMIP6 datasets to Zarr with Pangeo Forge and why we should never do this again!”
Invited Speakers: Julius Busecke (ORCID: 0000-0001-8571-865X), Charles Stern (ORCID: 0000-0002-4078-0852)
When: Wednesday, Nov 29, 12PM EST
Where: Launch Meeting - Zoom
Abstract:
The Pangeo CMIP6 working group has maintained an analysis-ready cloud optimized (ARCO) zarr copy of hundreds of thousands of CMIP6 datasets, but until now the process was incredibly work intensive and manual. While retracted datasets were removed, many of the newly available and requested datasets were not added to the ARCO zarr stores…until now.

Using the newest version of Pangeo-Forge based on Apache-Beam we are able to ingest and transform thousands of datasets from the ESGF catalog into ARCO zarr stores based on user requests. To realize this workflow we have implemented several features like dynamic (at recipe runtime) chunking, testing, and cataloging as part of Apache-Beam pipelines.

We welcome new dataset requests to scale this operation further and increase the impact of the cloud based CMIP6 data. Despite the successes we had in ingestion of the datasets, I will highlight the need for future CMIP generations to be delivered in a cloud native way, without the need for efforts like this.

20 minutes - Community Showcase
40 minutes - Showcase discussion/Community check-ins

Topic		Replies	Views
Best practices for continuously-updating Zarr data store (e.g., MERRA-2) Data	2	1239	September 4, 2023
Access to Pangeo GCS Bucket to push model output from pre-CMIP6 experiments? Cloud	6	1286	November 21, 2019
Pangeo Forge bakeries Cloud	21	1460	October 19, 2023
CMIP6 Zarr datasets on AWS — useful for interactive exploration? Data	1	1020	June 10, 2021
Looking for the best way to compliment, rather than compete with, this community, but commercially Meta	9	1339	April 7, 2021

Pangeo Showcase: "How to transform thousands of CMIP6 datasets to zarr with Pangeo Forge - And why we should never do this again!"

Related topics