Kevin and I we were discussion how to best host a gallery of our example notebooks demonstrating Pangeo-approved tools.
It occurred to us that rather than creating another resource that is NCAR specific, we should as much as possible contribute to the existing Pangeo gallery. In order to do this, we may need to work together with the authors of other extant notebooks to create a common environment.
I want to make sure I understand the mission statement of the Pangeo gallery. I’m sure it isn’t altogether surprising, but I want to make sure I don’t step on someone’s toes / make sure that I understand the work that has already been done and synthesize our efforts into that existing work.
If anyone has any insights, comments, or would like to be a part of the effort to refresh the Pangeo gallery – let’s discuss!
Thanks for raising this important issue Julia! I totally agree our galleries need a major refactor.
My goal for this is to create a “self-service” system, similar to conda-forge, where anyone can contribute a gallery. The key requirements for the gallery are:
- User can completely specify their own custom environment (could be an image from Pangeo Stacks using binder
- Notebooks are linted
- Gallery examples are all executable in [Pangeo] binder
- But notebooks are also executed by CI and rendered into static HTML in the same environment
The last requirement is tricky, because some of the examples will require major compute resources, and possibly special cloud things like dask-kubernetes, data credentials, etc. The clever way to solve this is to use Binder itself to execute the notebooks, via its API. I have a proof of concept that this is possible.
Plugged all together, the architecture would be something like this:
I have created the
pangeo-gallery GitHub org as a place to start playing around with this:
I envision this would eventually work kind of like conda forge. To add a new gallery, users would fork a “staging” repo, make PR, and the CI would create a new gallery repo in the org.
For now, we can skip that piece and just try to plug the following elements together.
Julia, if you’d like to work on this, I recommend you start by creating these repos and populating them with some basic content. (You should have admin rights on the pangeo-gallery org). Once that is up an running, we can discuss more specific next steps via GitHub issues.
Ryan thank you for this detailed outline of your goals for the gallery. One challenge is that some of the NCAR specific notebooks use data not available on the cloud and may not be able to be binderized. We might want a static version of some notebooks.
One of the questions we had was if you can have multiple environment files within the repo that stores the example notebooks. From what you are saying it sounds like we can! Can you confirm this?
I imagine that we will need some sort of config file in which you could specify whether or not to render the notebooks using binder. I really encourage you to try to binderize as much a possible. (Or work in the NCAR binder idea )
Almost. Instead, the gallery will consist of many repos (in my diagram, one is oceanography, another is cmip). Each repo is a binder and so has its own environment file. All the notebooks in the repo share the same environment. If you needed a different environment, you would need a different repo.
@rabernat Thanks for the excellent starting points! I think this is a great plan. Julia and I will start looking into this.
Within this scenario, it definitely sounds like an NCAR BinderHub would be the best way for sharing NCAR examples. An NCAR BinderHub would have to be access-limited to use Cheyenne or Casper, but the BinderHub can use NCAR’s existing JupyterHub for authentication , as far as I understand it. I will try to push on this to see if we can stand something up like that, which will allow us to add NCAR-specific examples to the community gallery.
Until we get that operational, though, we can submit static renderings of the NCAR notebooks to the same gallery.
Thanks, again, for the pointers!
Actually, I amend my previous statement about standing up a BinderHub at NCAR. I just found this discussion on the Jupyter Discourse site. For the TL;DR crowd, the summary is that Binder is “intrinsically” linked to
repo2docker's container-based workflow. In the HPC setting, Docker-based containers are problematic, so an alternative to
repo2docker would need to be created (e.g.,
repo2conda to create a conda environment that was not containerized).
As to whether Binder itself could ever accommodate something like
repo2conda, instead of
repo2docker, it was unclear from the discussion, but it sounded pretty unlikely. Instead, they recommended using the JupyterHub API directly and use a custom Spawner. …So, more research is required.
The easy solution to static, non-binder-executed examples is to just commit them directly to the “Gallery Repo”.
I have started putting this Gallery Repo together and will push it up to github later today.
I have created the gallery sphinx site:
It is being deployed here: http://gallery.pangeo.io/
@jukent - please go nuts with customizing, tweaking, improving this layout and content
In the meantime, will work on the next step–making the CI that connects example repos to the gallery.
inspired by the OSM20 tuturial, I would like to contribute a climpred.readthedocs.io/ CMIP6 DCPP cloud demo to pangeo gallery. computing the cloud might make sense because one variable in DCPP can easily take up 60inits x 30members x 120 leadmonths timesteps of data.
I created https://github.com/aaronspring/climpred-cloud-demo with https://binder.pangeo.io/ and https://pangeo-binder.readthedocs.io/en/latest/.
However, I am unsure about an effective workflow, because I currently just copy-pasted a few cells from the OSM20 tutorial on binder changed for my purposes into this new repo I created locally and pushed to https://github.com/aaronspring/climpred-cloud-demo. There it isnt executed and I guess there is a better way than copy-pasting. Is there a way to save a notebook configuration from the cloud to local to commit it to the repo? Is https://jupyterhub.github.io/nbgitpuller/ a solution? Whats the workflow to propose?
Other related question: Did I started from the most up-to-date template? Or should I rather just copy one of the existing pangeo-gallery projects?
A very rushed update on where this stands.
This is all a bit messy and there are lots of things that need to be worked out. But the fundamentals do work.
I am continuing to work on this. Have not had as much time as I hoped, but some progress has been made.
If someone would like to try this (@kmpaul?), the steps would be the following:
There are lots of details that need to be sorted out. It would be useful for me to talk these through with someone if anyone is game. Particularly someone who has a deeper understanding of github actions.
Yes! I would love to try this out. I’ll try to take some of the notebooks that we have in our https://github.com/NCAR/notebook-gallery and add them to the Pangeo gallery.
This is really cool, @rabernat!
The catch is that the system currently assumes that the notebook will be built with binder. To add a static notebook, just add the submodule to pangeo-gallery without setting up the CI in the submodule repository.
Found this thread as I searched for more info on the pangeo-gallery organization. I’m not entirely clear on where the pangeo gallery repos should live (for instance, there’s a pangeo-data org repo with gallery examples for Pangeo, but then other gallery example repos are housed in the pangeo-gallery org).
You’re correct that things are somewhat unclear are inconsistent. Par for the course with Pangeo…
When I first set up Pangeo Gallery, I thought that all the repos would have to live in the
pangeo-gallery Github org. But while building it, I realized they could actually live anywhere on Github. So the
pangeo-gallery org is somewhat superfluous. For consistency, I would favor moving all example repos out of
Thanks! I’m more than happy to follow your direction, but I’m wondering if there are many situations where it actually makes more sense to have the Pangeo gallery information hosted in the parent organization/repo? I suppose it depends on the balance of Pangeo gallery examples that are tied to other projects (e.g. icepyx) versus those that were created explicitly for the Pangeo gallery, and I don’t have a sense of which type is more prevalent. I’m just thinking in terms of overall maintenance. In the case where all repos are housed in the
pangeo-gallery org, they must be maintained by the Pangeo team directly (even if it’s just accepting PRs). In the case where an individual repo contains the Pangeo gallery files its needs, these files can be maintained directly by that project as the project is updated and the Pangeo gallery should only need to update once to add that repo to the gallery. I hope that makes sense and reflects an accurate understanding of how the gallery works!
@rabernat or other binderbot folks, I tried to add two new notebooks to the ESIP gallery but github actions failed in some way I don’t understand: https://github.com/rsignell-usgs/esip-gallery/actions/runs/331931145
I’m hoping someone more familiar with the workflow sees something obvious for me to fix. Thanks!
Looks like it could be a fluke. Can you restart the workflow and see if it repeats the same error?
Indeed @rabernat, simply clicking “re-run” jobs worked! Thanks!