Cloud Example: 3hr Precip Frequency Distribution

rabernat · September 21, 2019, 3:43am

I know many people are curious about the Pangeo cloud-based environment and what a real hackathon project might look like there. We are working on a comprehensive contributor guide, which will give some guidelines on how to structure your project, best practices for working with data in the cloud, repo templates etc. However, there are a few details to be worked out regarding the data catalog, and we aren’t quite ready to release this guide yet.

In the meantime, to whet your appetite, I have created a bare bones demo notebook of a semi-realistic workflow.

The calculation was inspired by @apendergrass’s work on precipitation statistics (e.g. this paper or this website).

This example includes:

Searching the data catalog and finding all available models (technically source_ids) with 3-hourly precip data, historical and ssp585 experiments. (Only four at this point.)
Calculating the zonal-mean precipitation histograms using the xhistogram package, using dask to speed up and parallelize the calculation
Visualizing the changes under a global warming scenario.

The results for one model look something like this:

I don’t have enough expertise on this topic to know whether this is a scientifically interesting calculation, but it makes a decent demo. In particular, it shows how easy it is to work with very high-frequency 3-hourly data in the cloud environment. The whole calculation takes just a couple of minutes.

This example is available as a binder, so you can try it yourself.

I hope this demo helps clarify the sort of workflow we will be using for the hackathon projects.

tristan · August 11, 2020, 5:28pm

Very helpful notebook! It has helped me run CMIP6 analyses on my own computer.

I’m new to Dask and Pangeo and am working to set these up with the Pangeo Cloud computing resources. Could you explain the difference in approach for setting up a cluster here by using KubeCluster and cluster.adapt() versus setting up a cluster in the demo notebooks for Pangeo that instead use Gateway and cluster.scale()?

Tristan

Topic		Replies	Views
CESM Large Ensemble Analysis CMIP6 Hackathon	7	1631	December 16, 2019
Developing online CMIP6 ENSO tool: issues with unresponsive cluster and looking for potential collaborators Cloud	4	537	May 5, 2021
Best way to access CMIP6 data Cloud or HPC (in UK)?	3	1009	November 13, 2020
Collecting grid-metric files for CMIP6 output for cloud analysis Cloud	10	1447	June 27, 2022
Workflow for bivariate analysis of daily mean CMIP6 data? Science	1	357	January 17, 2023

Cloud Example: 3hr Precip Frequency Distribution

Related topics