In November last year I visited the Met Office with @kaedonkers. We spent some time looking at scientific workflows and parameterizing notebooks. To this end, we came up the concept of a set of parameterised notebooks accessible via an intake catalogue and then their output held in a catalogue.
The way it works at the moment is I have two drivers,
Notebook : this provides access to the notebooks, so a ymal catalogue can be made like:
sources:
enso:
args:
urlpath:
- “{{ CATALOG_DIR }}/experiment/calculate_enso.ipynb”
- “{{ CATALOG_DIR }}/experiment/calculate_enso_clim.ipynb”
description: ‘’
driver: intake_notebook.notebook_source.NotebookSource
metadata: {}
calculate_enso_clim.ipynb has a cell:
#parameters
''' calcaulte the climatology for ENSO area given a start and enddate
Parameters
----------
catfile : file path to intake catalog that has a variable called sst
startdate : date of first point to average (must be string e.g. "1974-1-1")
enddate : dat of last point to average (must be string e.g. "1984-12-31")
'''
catfile='c:/data/providance/sst.yml'
startdate = "1974-1-1"
enddate = "1984-12-31"
this is read and monkey patched by the driver to form an execute function
data_cat =os.path.abspath("C:/data/providance/sst.yml")
books = intake.open_catalog('C:/data/providance/notebook.yml')
nb =books.enso.read()
nb.calculate_enso_clim.execute(startdate='1960-1-1',enddate='2020-1-1',catfile=data_cat)
Executing the notebook causes a new directory to be made in the current directory with a uuid number. the resultant notebook output and parameters are sored there.
Next step I can load all the output as a catalogue using my experiment driver
cat =intake.open_experiment('.')
cat.get_params()
that returns all the parameters used in a pandas data frame, it also looks for data outputs and concatenates them together
This was just a an experiment and I’m keen to hear people’s thoughts about it!