Jupytext for version controlling jupyter notebooks on a Binder

I recently posted on twitter asking for advice on version controlling jupyter notebooks. I’m working collaboratively on a jupyter notebook to be used in an Oceanography Camp for Girls in a few weeks and resolving conflicts was a nightmare with all of that extra “stuff” jupyter notebooks save under the hood (json?). @rabernat saved the day with a recommendation for Jupytext, which works really well so far! Essentially, it links link your .ipynb file to a plain .py file and git tracks that instead of the .ipynb file. Jupytext then knows how to turn your .py file back into a notebook each time you open it.

However, I had a few challenges getting it set up to work on Binder. The disconnect for me was that not only do you have to add jupytext to the dependencies in the environment file, but you also have to add a postBuild script to install jupytext when the Binder is building. The answer is not that well detailed in the installation instructions and I’m a newbie so I struggled to figure it out. There is a section in the Jupytext FAQ about Binder it but I hope this is helpful for others who might be looking to get Jupytext set up.

2 Likes

Another solution is the nice tool called nbdime for rich visual diff and merge of notebooks.

I wrote up a quick tutorial on this topic for the CMIP6 Hackathon back in 2019:

3 Likes

Thanks, Brian! I’ll check out those other tools.

In the meantime, do you (or maybe @cgentemann?) know why jupytext would be working on mybinder but not on the Pangeo Binder? I just tried to switch to the Pangeo Binder and now the .py file isn’t opening as a notebook anymore. This is the repo: GitHub - Williams-OBGC-Lab/OCG_Saildrone: Oceanography Camp for Girls Saildrone Lesson

Jupytext works on this one : [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/Williams-OBGC-Lab/OCG_Saildrone/HEAD)

But not on either of these: [![badge](https://img.shields.io/static/v1.svg?logo=Jupyter&label=Pangeo+Binder&message=GCE+us-central1&color=blue)](https://binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/master?urlpath=git-pull?repo=https://github.com/Williams-OBGC-Lab/OCG_Saildrone)

[![badge](https://img.shields.io/static/v1.svg?logo=Jupyter&label=Pangeo+Binder&message=AWS+us-west-2&color=orange)](https://aws-uswest2-binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/master?urlpath=git-pull?repo=https://github.com/Williams-OBGC-Lab/OCG_Saildrone)

Hi @nwilliams6 are you using accessing data in Google or AWS? Using a dask cluster? If not you may want to just stick with mybinder.org.

The binder links using Pangeo are rather complicated as they separate the environment definition (pangeo-gallery/default-binder) from the repository with content (Williams-OBGC-Lab/OCG_Saildrone). This is accomplished with the nbgitpuller package nbgitpuller link generator — nbgitpuller 0.1b documentation . But as a consequence of that setup note that your environment.yml with jupytext is not being used.

If you just want to use a pangeo binderhub the same way that you’re using mybinder.org you only need to change the first part of the URL.

For example to run on Google:
https://binder.pangeo.io/v2/gh/Williams-OBGC-Lab/OCG_Saildrone/HEAD

Or run on AWS:
https://aws-uswest2-binder.pangeo.io/v2/gh/Williams-OBGC-Lab/OCG_Saildrone/HEAD

Thanks for the reply, @scottyhq! That makes sense. I am not using any cloud data in that notebook at the moment, but plan to. I think the main reason Chelle suggested I switched was for the longer timeout period and more memory. Does this “non-cloud” Pangeo Binderhub have those traits?

I’ve gone ahead and switched to https://binder.pangeo.io/v2/gh/Williams-OBGC-Lab/OCG_Saildrone/HEAD but am now getting a different error during the Binder build:
Pip subprocess error: ERROR: Package 'xarray' requires a different Python: 3.6.11 not in '>=3.7'

CondaEnvException: Pip failed

How should I go about resolving that? Thanks again for your help!

The question of comparing binderhubs comes up a lot, so I made a little table:

Note that mybinder.org can send you to 1 of 3 cloud providers or you can use a prefix to use a specific service provider:

BinderHub vCPU RAM (GB) Cloud provider Max Session (hr) Dask-gateway
binder.pangeo.io 4 8 Google us-central1 3 yes
aws-uswest2-binder.pangeo.io 4 8 AWS us-west-2 3 yes
- - - - - -
gke.mybinder.org 1 2 Google us-central1 6 no
ovh.mybinder.org 1 2 OVH ? ? no
gesis.mybinder.org 2 8 Custom Server 6 no

On a JupyterHub / BinderHub you can get limits running this command:
printenv | grep LIMIT

Time limits are harder to find. If the binderhub repository is public you can look at the values.yml configuration file

CondaEnvException: Pip failed ERROR: Package ‘xarray’ requires a different Python: 3.6.11 not in ‘>=3.7’

This is an issue with versions in your environment.yml (the error message suggests you have pinned Python=3.6, but some of your packages require Python>=3.7). This unfortunately is quite common when only some packages are pinned, and new releases of unpinned packages lead to incompatible version requirements. Here is a nice blog post on the topic Managing dependencies for reproducible (scientific) software | Noah D. Brenowitz

2 Likes

I got it working, thanks for your help!