Connecting to Dask-Gateway cluster from Outside a JupyterHub

On the pangeo call, @jeffdlb had a question about connecting to a Dask Gateway
cluster from outside the hub. I’ll document it properly somewhere later,
but for now…

On the hub this is as simple as

from dask_gateway import Gateway
from dask.distributed import Client
gateway = Gateway()
cluster = gateway.new_cluster() 
client = Client(cluster)

When you’re outside the jupyterhub, you’re responsible for two additional things

  1. Specifying the connection parameters to the Gateway (including authentication)
  2. Ensursure that package versions match between your local machine and the cluster.

To connect to the gateway, you’ll need the URLs for the gateway and an API
token. The easiest way to get the URL is probably to just ask (they may also be
in the output of the logs on https://github.com/pangeo-data/pangeo-cloud-federation), but that’s annoying to dig through. Hopefully we’ll publish them somewhere that’s easy to find.

For authentication, you’ll need an API key for Jupyterhub (the Gateway is re-using Jupyterhub’s auth). You should

  • Log into the Hub (staging.hub.pangeo.io in this example)
  • Navigate to the Control Panel (File > Hub Control Panel)
  • Navigate to the “Token” screen (button up top)
  • Click the “Request new API token” button, which will display your API token.

Then, on your client machine (your laptop say), you’ll connect to the gateway

>>> from dask_gateway import Gateway
>>> from dask_gateway.auth import JupyterHubAuth
>>> auth = JupyterHubAuth(api_token="<your token here>")
>>> gateway = Gateway(
...    address="https://staging.hub.pangeo.io/services/dask-gateway/",  # URL to the hub + /services/dask-gateway
...    proxy_address="tls://35.225.202.35:8786",  # Ask for this URL
...    auth=auth
... )

Now you’re connected to the Gateway, and can create clusters. This is where your second responsibility comes in: ensure that versions of your libraries match. Ideally all the packages match, but nothing will work unless the versions of dask, distributed, and dask-gateway are mismatched.

>>> from dask.distributed import Client
>>> cluster = gateway.new_cluster()
>>> client = Client(cluster)

Then you should be all set.

3 Likes