Dask Kubernetes Setup Issue

Cloud Dask Kubernetes Setup Issue

Hello everyone!

I’ve been a user of Pangeo Cloud and its distributed computing capabilities through Dask. The last time I used it (about a year ago), everything was working perfectly. However, I recently tried setting up a Dask cluster using dask_kubernetes, and I’ve encountered an issue.

The Code

Here’s the snippet of code I’ve been using:

from dask.distributed import Client, progress
from dask_kubernetes import KubeCluster

cluster = KubeCluster()
cluster.adapt(minimum=1, maximum=20)

client = Client(cluster)
print(client)

The Problem

When I run the above code, I get the following error:

ModuleNotFoundError: No module named 'dask_kubernetes'

Given that this setup worked in the past, I’m wondering:

  • Has there been any significant changes to the Pangeo Cloud in the last year that might affect this?
  • Is it possible that dask_kubernetes has been deprecated or replaced with another module within the Pangeo Cloud environment?

Hello @lei I’m surprised that code worked a year ago, as the Pangeo JupyterHubs switched from using dask_kubernetes to dask-gateway many years ago. In any case, you should be able to setup your cluster as documented here and still run your code Pangeo Cloud — Pangeo documentation

from dask_gateway import GatewayCluster

cluster = GatewayCluster()
cluster.adapt(minimum=2, maximum=10)  # or cluster.scale(n) to a fixed size.
client = cluster.get_client()

Apologies, I have not used Pangeo in a while so I was mistaken. Thank you for correcting me on the proper way to initialize a Dask cluster on Pangeo Cloud.