I know many people are curious about the Pangeo cloud-based environment and what a real hackathon project might look like there. We are working on a comprehensive contributor guide, which will give some guidelines on how to structure your project, best practices for working with data in the cloud, repo templates etc. However, there are a few details to be worked out regarding the data catalog, and we aren’t quite ready to release this guide yet.
In the meantime, to whet your appetite, I have created a bare bones demo notebook of a semi-realistic workflow.
The calculation was inspired by @apendergrass’s work on precipitation statistics (e.g. this paper or this website).
This example includes:
- Searching the data catalog and finding all available models (technically
source_id
s) with 3-hourly precip data,historical
andssp585
experiments. (Only four at this point.) - Calculating the zonal-mean precipitation histograms using the xhistogram package, using dask to speed up and parallelize the calculation
- Visualizing the changes under a global warming scenario.
The results for one model look something like this:
I don’t have enough expertise on this topic to know whether this is a scientifically interesting calculation, but it makes a decent demo. In particular, it shows how easy it is to work with very high-frequency 3-hourly data in the cloud environment. The whole calculation takes just a couple of minutes.
This example is available as a binder, so you can try it yourself.
I hope this demo helps clarify the sort of workflow we will be using for the hackathon projects.