I was looking into machine-learning experiment tracking framekworks, and decided to see if we could get MLFlow on our JupyterHubs. With a bit of work, users are actually able to do this today by themselves:
- Install MLFlow into the user environment:
pip install --user mlfow. (It’s possible this will break other stuff in your environment: http://pangeo.io/cloud.html#software-environment). If there’s demand for this, we’ll include it in the pangeo-notebook environment that’s loaded by default.
- Start the MLFlow server
PATH=/home/jovyan/.local/bin:$PATH mlflow server -w 1. This starts a server on
localhost:5000inside your running pod
- Run an experiment (from their quickstart: https://mlflow.org/docs/latest/quickstart.html)
In : import os ...: from random import random, randint ...: from mlflow import log_metric, log_param, log_artifacts ...: ...: if __name__ == "__main__": ...: # Log a parameter (key-value pair) ...: log_param("param1", randint(0, 100)) ...: ...: # Log a metric; metrics can be updated throughout the run ...: log_metric("foo", random()) ...: log_metric("foo", random() + 1) ...: log_metric("foo", random() + 2) ...: ...: # Log an artifact (output file) ...: if not os.path.exists("outputs"): ...: os.makedirs("outputs") ...: with open("outputs/test.txt", "w") as f: ...: f.write("hello world!") ...: log_artifacts("outputs") ...:
- View the results at
https://us-central1-b.gcp.pangeo.io/user/<your-username>/proxy/5000/. I’m not sure why, but the trailing
/is important. My browser blocked the MLFlow page from loading without it. So mine was at https://us-central1-b.gcp.pangeo.io/user/tomaugspurger/proxy/5000/
Side-note: this is a standard use of jupyter-server-proxy to route connections to a server running inside a JupyterHub session through Jupyter itself.
MLFlow’s “database” is just a bunch of files on disk, and we started it in your home directory, so stuff will persist across binder sessions.
This may be enough for people. But we could go further and deploy MLFlow alongside our JupyterHubs, and register it as a JupyterHub service. There are a few things that I don’t understand yet:
- When a user
ml flow runto execute a project, what Linux user runs the actual process. What software environment does it run in (I think it conda builds stuff on the fly?)
- Where to store the data? For the Hubs we have a sqlite database on a PVC. Can we place the mlflow database next to it? But do users interact with that at all (e.g. writing the results to a disk, or MLFlow presumably abstracts all that away)?
- Would all the “Users” just be
jovyan, or can we get the actual user stored in the database from the jupyterhub label?
Beyond perhaps including
mlfow in the standard environment, I don’t plan to look into this anymore unless there’s some real interest. But hopefully this little bit is useful!