Intake Note Book: Access Notebooks from a intake catalogue and executing them using Paper Mill

In November last year I visited the Met Office with @kaedonkers. We spent some time looking at scientific workflows and parameterizing notebooks. To this end, we came up the concept of a set of parameterised notebooks accessible via an intake catalogue and then their output held in a catalogue.

The way it works at the moment is I have two drivers,

Notebook : this provides access to the notebooks, so a ymal catalogue can be made like:


  • “{{ CATALOG_DIR }}/experiment/calculate_enso.ipynb”
  • “{{ CATALOG_DIR }}/experiment/calculate_enso_clim.ipynb”
    description: ‘’
    driver: intake_notebook.notebook_source.NotebookSource
    metadata: {}

calculate_enso_clim.ipynb has a cell:

''' calcaulte the climatology for ENSO area given a start and enddate
    catfile : file path to intake catalog that has a variable called sst
    startdate : date of first point to average (must be string e.g. "1974-1-1")
    enddate : dat of last point to average (must be string e.g. "1984-12-31")
startdate = "1974-1-1"
enddate = "1984-12-31"

this is read and monkey patched by the driver to form an execute function

data_cat =os.path.abspath("C:/data/providance/sst.yml")
books = intake.open_catalog('C:/data/providance/notebook.yml')

Executing the notebook causes a new directory to be made in the current directory with a uuid number. the resultant notebook output and parameters are sored there.

Next step I can load all the output as a catalogue using my experiment driver

cat =intake.open_experiment('.')

that returns all the parameters used in a pandas data frame, it also looks for data outputs and concatenates them together

This was just a an experiment and I’m keen to hear people’s thoughts about it!