Question about how to best store self-written functions, when using dask_gateway

I have several notebooks, using LocalCluster, and Gateway in a couple of them, and reuse some functions across notebooks. If a dask.delayed-object is defined to use a function imported from a .py-file in the /home/jovyan/-directory, the remote worker responds with ModuleNotFoundError, which is understandable. However, defining the same function in a notebook-cell first, solves that, which is what I currently am doing.

This is not a problem, I just wanted to ask, is it possible to keep the functions in a .py-file and still use a remote cluster, instead of defining the same functions across notebooks?

If so, how do other users do this?

I read that installing packages with PipInstallPlugin is possible, but then I would have to make the functions into a package which I am not familiar with creating. But will gladly learn how to do that if that is the recommended solution.


First to explain: dask uses pickle/cloudpickle to bundle your tasks for sending to the cluster. That will use a reference to a .py file when possible and only serialise code defined in __main__, which is why your notebook cells work but code in files won’t.

The pip loader plugin is the best solution when you have a shared directory you are editing (local, or clustered storage).

For your situation, you probably want the upload_file() method, which puts the file you send into an importable location on the workers. It should reload the module on every upload, but if the file changes or new workers appear, you will need to call the method again each time.

1 Like

Thanks alot, I will try out the ‘client.upload_file’-method.