Pangeo Example Gallery

jukent · February 11, 2020, 4:49pm

Kevin and I we were discussion how to best host a gallery of our example notebooks demonstrating Pangeo-approved tools.

It occurred to us that rather than creating another resource that is NCAR specific, we should as much as possible contribute to the existing Pangeo gallery. In order to do this, we may need to work together with the authors of other extant notebooks to create a common environment.

I want to make sure I understand the mission statement of the Pangeo gallery. I’m sure it isn’t altogether surprising, but I want to make sure I don’t step on someone’s toes / make sure that I understand the work that has already been done and synthesize our efforts into that existing work.

If anyone has any insights, comments, or would like to be a part of the effort to refresh the Pangeo gallery – let’s discuss!

rabernat · February 11, 2020, 7:46pm

Thanks for raising this important issue Julia! I totally agree our galleries need a major refactor.

My goal for this is to create a “self-service” system, similar to conda-forge, where anyone can contribute a gallery. The key requirements for the gallery are:

User can completely specify their own custom environment (could be an image from Pangeo Stacks using binder
Notebooks are linted
Gallery examples are all executable in [Pangeo] binder
But notebooks are also executed by CI and rendered into static HTML in the same environment

The last requirement is tricky, because some of the examples will require major compute resources, and possibly special cloud things like dask-kubernetes, data credentials, etc. The clever way to solve this is to use Binder itself to execute the notebooks, via its API. I have a proof of concept that this is possible.

Plugged all together, the architecture would be something like this:

I have created the pangeo-gallery GitHub org as a place to start playing around with this:
https://github.com/pangeo-gallery.

I envision this would eventually work kind of like conda forge. To add a new gallery, users would fork a “staging” repo, make PR, and the CI would create a new gallery repo in the org.

For now, we can skip that piece and just try to plug the following elements together.

pangeo-gallery/test-example: a repo with an example gallery
pangeo-gallery/pangeo-gallery: the sphinx site (can be modeled on https://github.com/pangeo-data/pangeo)

Julia, if you’d like to work on this, I recommend you start by creating these repos and populating them with some basic content. (You should have admin rights on the pangeo-gallery org). Once that is up an running, we can discuss more specific next steps via GitHub issues.

jukent · February 12, 2020, 12:34am

Ryan thank you for this detailed outline of your goals for the gallery. One challenge is that some of the NCAR specific notebooks use data not available on the cloud and may not be able to be binderized. We might want a static version of some notebooks.

One of the questions we had was if you can have multiple environment files within the repo that stores the example notebooks. From what you are saying it sounds like we can! Can you confirm this?

rabernat · February 12, 2020, 1:25pm

I imagine that we will need some sort of config file in which you could specify whether or not to render the notebooks using binder. I really encourage you to try to binderize as much a possible. (Or work in the NCAR binder idea )

Almost. Instead, the gallery will consist of many repos (in my diagram, one is oceanography, another is cmip). Each repo is a binder and so has its own environment file. All the notebooks in the repo share the same environment. If you needed a different environment, you would need a different repo.

Make sense?

kmpaul · February 13, 2020, 5:27pm

@rabernat Thanks for the excellent starting points! I think this is a great plan. Julia and I will start looking into this.

Within this scenario, it definitely sounds like an NCAR BinderHub would be the best way for sharing NCAR examples. An NCAR BinderHub would have to be access-limited to use Cheyenne or Casper, but the BinderHub can use NCAR’s existing JupyterHub for authentication , as far as I understand it. I will try to push on this to see if we can stand something up like that, which will allow us to add NCAR-specific examples to the community gallery.

Until we get that operational, though, we can submit static renderings of the NCAR notebooks to the same gallery.

Thanks, again, for the pointers!

kmpaul · February 13, 2020, 6:02pm

Actually, I amend my previous statement about standing up a BinderHub at NCAR. I just found this discussion on the Jupyter Discourse site. For the TL;DR crowd, the summary is that Binder is “intrinsically” linked to repo2docker's container-based workflow. In the HPC setting, Docker-based containers are problematic, so an alternative to repo2docker would need to be created (e.g., repo2conda to create a conda environment that was not containerized).

As to whether Binder itself could ever accommodate something like repo2conda, instead of repo2docker, it was unclear from the discussion, but it sounded pretty unlikely. Instead, they recommended using the JupyterHub API directly and use a custom Spawner. …So, more research is required.

rabernat · February 13, 2020, 6:27pm

The easy solution to static, non-binder-executed examples is to just commit them directly to the “Gallery Repo”.

I have started putting this Gallery Repo together and will push it up to github later today.

rabernat · February 13, 2020, 7:15pm

I have created the gallery sphinx site:

It is being deployed here: http://gallery.pangeo.io/

@jukent - please go nuts with customizing, tweaking, improving this layout and content

In the meantime, will work on the next step–making the CI that connects example repos to the gallery.

aaronspring · February 21, 2020, 4:50am

Hi guys,

inspired by the OSM20 tuturial, I would like to contribute a climpred.readthedocs.io/ CMIP6 DCPP cloud demo to pangeo gallery. computing the cloud might make sense because one variable in DCPP can easily take up 60inits x 30members x 120 leadmonths timesteps of data.

I created https://github.com/aaronspring/climpred-cloud-demo with https://binder.pangeo.io/ and https://pangeo-binder.readthedocs.io/en/latest/.

However, I am unsure about an effective workflow, because I currently just copy-pasted a few cells from the OSM20 tutorial on binder changed for my purposes into this new repo I created locally and pushed to https://github.com/aaronspring/climpred-cloud-demo. There it isnt executed and I guess there is a better way than copy-pasting. Is there a way to save a notebook configuration from the cloud to local to commit it to the repo? Is https://jupyterhub.github.io/nbgitpuller/ a solution? Whats the workflow to propose?

Other related question: Did I started from the most up-to-date template? Or should I rather just copy one of the existing pangeo-gallery projects?

rabernat · March 17, 2020, 6:19pm

A very rushed update on where this stands.

I have created binderbot, a CLI for running binders.
I created an example gallery that uses binderbot as part of it CI. The CI runs the notebooks inside binder, downloads the executed notebooks, and pushes them to the binderbot-built branch. This repo needs to be converted to a cookiecutter or template.
I have linked this to a static sphinx gallery site at https://github.com/pangeo-gallery/pangeo-gallery. The example gallery is a submodule: https://github.com/pangeo-gallery/pangeo-gallery/tree/master/repos
The built site is here: http://gallery.pangeo.io

This is all a bit messy and there are lots of things that need to be worked out. But the fundamentals do work.

rabernat · March 24, 2020, 4:30am

I am continuing to work on this. Have not had as much time as I hoped, but some progress has been made.

If someone would like to try this (@kmpaul?), the steps would be the following:

Try to copy the structure of https://github.com/pangeo-gallery/example-gallery in a new repo
Make a PR to add this as a new submodule in https://github.com/pangeo-gallery/pangeo-gallery
See what breaks

There are lots of details that need to be sorted out. It would be useful for me to talk these through with someone if anyone is game. Particularly someone who has a deeper understanding of github actions.

kmpaul · March 24, 2020, 2:45pm

Yes! I would love to try this out. I’ll try to take some of the notebooks that we have in our https://github.com/NCAR/notebook-gallery and add them to the Pangeo gallery.

This is really cool, @rabernat!

rabernat · March 24, 2020, 2:47pm

The catch is that the system currently assumes that the notebook will be built with binder. To add a static notebook, just add the submodule to pangeo-gallery without setting up the CI in the submodule repository.

JessicaS11 · June 30, 2020, 6:10pm

Found this thread as I searched for more info on the pangeo-gallery organization. I’m not entirely clear on where the pangeo gallery repos should live (for instance, there’s a pangeo-data org repo with gallery examples for Pangeo, but then other gallery example repos are housed in the pangeo-gallery org).

rabernat · July 1, 2020, 1:57am

You’re correct that things are somewhat unclear are inconsistent. Par for the course with Pangeo…

When I first set up Pangeo Gallery, I thought that all the repos would have to live in the pangeo-gallery Github org. But while building it, I realized they could actually live anywhere on Github. So the pangeo-gallery org is somewhat superfluous. For consistency, I would favor moving all example repos out of pangeo-data into pangeo-gallery.

JessicaS11 · July 1, 2020, 9:31pm

Thanks! I’m more than happy to follow your direction, but I’m wondering if there are many situations where it actually makes more sense to have the Pangeo gallery information hosted in the parent organization/repo? I suppose it depends on the balance of Pangeo gallery examples that are tied to other projects (e.g. icepyx) versus those that were created explicitly for the Pangeo gallery, and I don’t have a sense of which type is more prevalent. I’m just thinking in terms of overall maintenance. In the case where all repos are housed in the pangeo-gallery org, they must be maintained by the Pangeo team directly (even if it’s just accepting PRs). In the case where an individual repo contains the Pangeo gallery files its needs, these files can be maintained directly by that project as the project is updated and the Pangeo gallery should only need to update once to add that repo to the gallery. I hope that makes sense and reflects an accurate understanding of how the gallery works!

rsignell · October 27, 2020, 8:56pm

@rabernat or other binderbot folks, I tried to add two new notebooks to the ESIP gallery but github actions failed in some way I don’t understand: https://github.com/rsignell-usgs/esip-gallery/actions/runs/331931145

I’m hoping someone more familiar with the workflow sees something obvious for me to fix. Thanks!

rabernat · October 27, 2020, 9:29pm

Looks like it could be a fluke. Can you restart the workflow and see if it repeats the same error?

rsignell · October 28, 2020, 10:09am

Indeed @rabernat, simply clicking “re-run” jobs worked! Thanks!

Topic		Replies	Views
Including some example notebooks in Pangeo cluster Cloud	3	794	December 13, 2019
Future of Pangeo Cloud I: Binder for Everything? Cloud	8	1316	August 24, 2021
Statement of Need: Integrating JupyterBook and JupyterHubs via CI Cloud	17	1880	August 25, 2023
"Repo" for Pangeo-powered presentations, videos, posters, papers, etc? Meta	3	583	February 17, 2021
Pangeo Tools Repo CMIP6 Hackathon	0	576	October 16, 2019

Pangeo Example Gallery

Related topics