Kevin and I we were discussion how to best host a gallery of our example notebooks demonstrating Pangeo-approved tools.
It occurred to us that rather than creating another resource that is NCAR specific, we should as much as possible contribute to the existing Pangeo gallery. In order to do this, we may need to work together with the authors of other extant notebooks to create a common environment.
I want to make sure I understand the mission statement of the Pangeo gallery. I’m sure it isn’t altogether surprising, but I want to make sure I don’t step on someone’s toes / make sure that I understand the work that has already been done and synthesize our efforts into that existing work.
If anyone has any insights, comments, or would like to be a part of the effort to refresh the Pangeo gallery – let’s discuss!
Thanks for raising this important issue Julia! I totally agree our galleries need a major refactor.
My goal for this is to create a “self-service” system, similar to conda-forge, where anyone can contribute a gallery. The key requirements for the gallery are:
User can completely specify their own custom environment (could be an image from Pangeo Stacks using binder
Notebooks are linted
Gallery examples are all executable in [Pangeo] binder
But notebooks are also executed by CI and rendered into static HTML in the same environment
The last requirement is tricky, because some of the examples will require major compute resources, and possibly special cloud things like dask-kubernetes, data credentials, etc. The clever way to solve this is to use Binder itself to execute the notebooks, via its API. I have a proof of concept that this is possible.
Plugged all together, the architecture would be something like this:
I envision this would eventually work kind of like conda forge. To add a new gallery, users would fork a “staging” repo, make PR, and the CI would create a new gallery repo in the org.
For now, we can skip that piece and just try to plug the following elements together.
pangeo-gallery/test-example: a repo with an example gallery
Julia, if you’d like to work on this, I recommend you start by creating these repos and populating them with some basic content. (You should have admin rights on the pangeo-gallery org). Once that is up an running, we can discuss more specific next steps via GitHub issues.
Ryan thank you for this detailed outline of your goals for the gallery. One challenge is that some of the NCAR specific notebooks use data not available on the cloud and may not be able to be binderized. We might want a static version of some notebooks.
One of the questions we had was if you can have multiple environment files within the repo that stores the example notebooks. From what you are saying it sounds like we can! Can you confirm this?
I imagine that we will need some sort of config file in which you could specify whether or not to render the notebooks using binder. I really encourage you to try to binderize as much a possible. (Or work in the NCAR binder idea )
Almost. Instead, the gallery will consist of many repos (in my diagram, one is oceanography, another is cmip). Each repo is a binder and so has its own environment file. All the notebooks in the repo share the same environment. If you needed a different environment, you would need a different repo.
@rabernat Thanks for the excellent starting points! I think this is a great plan. Julia and I will start looking into this.
Within this scenario, it definitely sounds like an NCAR BinderHub would be the best way for sharing NCAR examples. An NCAR BinderHub would have to be access-limited to use Cheyenne or Casper, but the BinderHub can use NCAR’s existing JupyterHub for authentication , as far as I understand it. I will try to push on this to see if we can stand something up like that, which will allow us to add NCAR-specific examples to the community gallery.
Until we get that operational, though, we can submit static renderings of the NCAR notebooks to the same gallery.
Actually, I amend my previous statement about standing up a BinderHub at NCAR. I just found this discussion on the Jupyter Discourse site. For the TL;DR crowd, the summary is that Binder is “intrinsically” linked to repo2docker's container-based workflow. In the HPC setting, Docker-based containers are problematic, so an alternative to repo2docker would need to be created (e.g., repo2conda to create a conda environment that was not containerized).
As to whether Binder itself could ever accommodate something like repo2conda, instead of repo2docker, it was unclear from the discussion, but it sounded pretty unlikely. Instead, they recommended using the JupyterHub API directly and use a custom Spawner. …So, more research is required.
inspired by the OSM20 tuturial, I would like to contribute a climpred.readthedocs.io/ CMIP6 DCPP cloud demo to pangeo gallery. computing the cloud might make sense because one variable in DCPP can easily take up 60inits x 30members x 120 leadmonths timesteps of data.
However, I am unsure about an effective workflow, because I currently just copy-pasted a few cells from the OSM20 tutorial on binder changed for my purposes into this new repo I created locally and pushed to https://github.com/aaronspring/climpred-cloud-demo. There it isnt executed and I guess there is a better way than copy-pasting. Is there a way to save a notebook configuration from the cloud to local to commit it to the repo? Is https://jupyterhub.github.io/nbgitpuller/ a solution? Whats the workflow to propose?
Other related question: Did I started from the most up-to-date template? Or should I rather just copy one of the existing pangeo-gallery projects?
I have created binderbot, a CLI for running binders.
I created an example gallery that uses binderbot as part of it CI. The CI runs the notebooks inside binder, downloads the executed notebooks, and pushes them to the binderbot-built branch. This repo needs to be converted to a cookiecutter or template.
There are lots of details that need to be sorted out. It would be useful for me to talk these through with someone if anyone is game. Particularly someone who has a deeper understanding of github actions.
Yes! I would love to try this out. I’ll try to take some of the notebooks that we have in our https://github.com/NCAR/notebook-gallery and add them to the Pangeo gallery.
The catch is that the system currently assumes that the notebook will be built with binder. To add a static notebook, just add the submodule to pangeo-gallery without setting up the CI in the submodule repository.
Found this thread as I searched for more info on the pangeo-gallery organization. I’m not entirely clear on where the pangeo gallery repos should live (for instance, there’s a pangeo-data org repo with gallery examples for Pangeo, but then other gallery example repos are housed in the pangeo-gallery org).
You’re correct that things are somewhat unclear are inconsistent. Par for the course with Pangeo…
When I first set up Pangeo Gallery, I thought that all the repos would have to live in the pangeo-gallery Github org. But while building it, I realized they could actually live anywhere on Github. So the pangeo-gallery org is somewhat superfluous. For consistency, I would favor moving all example repos out of pangeo-data into pangeo-gallery.
Thanks! I’m more than happy to follow your direction, but I’m wondering if there are many situations where it actually makes more sense to have the Pangeo gallery information hosted in the parent organization/repo? I suppose it depends on the balance of Pangeo gallery examples that are tied to other projects (e.g. icepyx) versus those that were created explicitly for the Pangeo gallery, and I don’t have a sense of which type is more prevalent. I’m just thinking in terms of overall maintenance. In the case where all repos are housed in the pangeo-gallery org, they must be maintained by the Pangeo team directly (even if it’s just accepting PRs). In the case where an individual repo contains the Pangeo gallery files its needs, these files can be maintained directly by that project as the project is updated and the Pangeo gallery should only need to update once to add that repo to the gallery. I hope that makes sense and reflects an accurate understanding of how the gallery works!