Pangeo Example Gallery

Kevin and I we were discussion how to best host a gallery of our example notebooks demonstrating Pangeo-approved tools.

It occurred to us that rather than creating another resource that is NCAR specific, we should as much as possible contribute to the existing Pangeo gallery. In order to do this, we may need to work together with the authors of other extant notebooks to create a common environment.

I want to make sure I understand the mission statement of the Pangeo gallery. I’m sure it isn’t altogether surprising, but I want to make sure I don’t step on someone’s toes / make sure that I understand the work that has already been done and synthesize our efforts into that existing work.

If anyone has any insights, comments, or would like to be a part of the effort to refresh the Pangeo gallery – let’s discuss!

3 Likes

Thanks for raising this important issue Julia! I totally agree our galleries need a major refactor.

My goal for this is to create a “self-service” system, similar to conda-forge, where anyone can contribute a gallery. The key requirements for the gallery are:

  • User can completely specify their own custom environment (could be an image from Pangeo Stacks using binder
  • Notebooks are linted
  • Gallery examples are all executable in [Pangeo] binder
  • But notebooks are also executed by CI and rendered into static HTML in the same environment

The last requirement is tricky, because some of the examples will require major compute resources, and possibly special cloud things like dask-kubernetes, data credentials, etc. The clever way to solve this is to use Binder itself to execute the notebooks, via its API. I have a proof of concept that this is possible.

Plugged all together, the architecture would be something like this:

I have created the pangeo-gallery GitHub org as a place to start playing around with this:
https://github.com/pangeo-gallery.

I envision this would eventually work kind of like conda forge. To add a new gallery, users would fork a “staging” repo, make PR, and the CI would create a new gallery repo in the org.

For now, we can skip that piece and just try to plug the following elements together.

Julia, if you’d like to work on this, I recommend you start by creating these repos and populating them with some basic content. (You should have admin rights on the pangeo-gallery org). Once that is up an running, we can discuss more specific next steps via GitHub issues.

2 Likes

Ryan thank you for this detailed outline of your goals for the gallery. One challenge is that some of the NCAR specific notebooks use data not available on the cloud and may not be able to be binderized. We might want a static version of some notebooks.

One of the questions we had was if you can have multiple environment files within the repo that stores the example notebooks. From what you are saying it sounds like we can! Can you confirm this?

I imagine that we will need some sort of config file in which you could specify whether or not to render the notebooks using binder. I really encourage you to try to binderize as much a possible. (Or work in the NCAR binder idea :wink:)

Almost. Instead, the gallery will consist of many repos (in my diagram, one is oceanography, another is cmip). Each repo is a binder and so has its own environment file. All the notebooks in the repo share the same environment. If you needed a different environment, you would need a different repo.

Make sense?

@rabernat Thanks for the excellent starting points! I think this is a great plan. Julia and I will start looking into this.

Within this scenario, it definitely sounds like an NCAR BinderHub would be the best way for sharing NCAR examples. An NCAR BinderHub would have to be access-limited to use Cheyenne or Casper, but the BinderHub can use NCAR’s existing JupyterHub for authentication , as far as I understand it. I will try to push on this to see if we can stand something up like that, which will allow us to add NCAR-specific examples to the community gallery.

Until we get that operational, though, we can submit static renderings of the NCAR notebooks to the same gallery.

Thanks, again, for the pointers!

Actually, I amend my previous statement about standing up a BinderHub at NCAR. I just found this discussion on the Jupyter Discourse site. For the TL;DR crowd, the summary is that Binder is “intrinsically” linked to repo2docker's container-based workflow. In the HPC setting, Docker-based containers are problematic, so an alternative to repo2docker would need to be created (e.g., repo2conda to create a conda environment that was not containerized).

As to whether Binder itself could ever accommodate something like repo2conda, instead of repo2docker, it was unclear from the discussion, but it sounded pretty unlikely. Instead, they recommended using the JupyterHub API directly and use a custom Spawner. …So, more research is required.

The easy solution to static, non-binder-executed examples is to just commit them directly to the “Gallery Repo”.

I have started putting this Gallery Repo together and will push it up to github later today.

I have created the gallery sphinx site:

It is being deployed here: http://gallery.pangeo.io/

@jukent - please go nuts with customizing, tweaking, improving this layout and content

In the meantime, will work on the next step–making the CI that connects example repos to the gallery.