I’ve been trying to parallelize a high resolution (2 m DEM) Xarray workflow using Dask and run it on Pangeo. I’ve encountered a host of issues (including memory leakage crashing the cluster - changing my chunk size seemed to help this problem - to Cancellation errors). I think the cancellation error is ultimately a memory problem (the final exception is asyncio.exceptions.CancelledError
but the stack trace includes ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1124)
and distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client_GatheringFuture exception was never retrieved
as well (I’m happy to share the whole thing - it’s quite long and not my primary question at the moment)).
I encountered the above errors after setting up my cluster as:
from dask_gateway import GatewayCluster
cluster = GatewayCluster()
cluster.adapt(minimum=2, maximum=10) # or cluster.scale(n) to a fixed size.
client = cluster.get_client()
client
In trying to troubleshoot this I wanted to play with my cluster settings, but I wasn’t getting real-time info from the Dask dashboard in a separate browser window (I’m on a tired, 8 year old computer while my new one is out for repairs). I’ve seen the Pangeo+Dask integration demo-ed a few times and wanted to launch my cluster through JupyterLab instead so I could use those features. I’m able to start a Dask cluster (though it takes awhile - is this normal? I remember it being faster in workshops), but when I inject the client code:
from dask.distributed import Client
client = Client("gateway://traefik-icesat2-prod-dask-gateway.icesat2-prod:80/icesat2-prod.c09995cf7b1340609256f1c8460b5e0b")
client
into any notebook and try to run it I get a ValueError (full stack trace at the end of this post). I wanted to report this as unexpected behavior and also ask, how can I get the Dask Dashboard as a panel within my Jupyter Lab environment? I’m working on the stable Pangeo image.
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-5de97e35304a> in <module>
1 from dask.distributed import Client
2
----> 3 client = Client("gateway://traefik-icesat2-prod-dask-gateway.icesat2-prod:80/icesat2-prod.ef3014285d52450084bc4f1fbfde6f94")
4 client
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in __init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, connection_limit, **kwargs)
746 ext(self)
747
--> 748 self.start(timeout=timeout)
749 Client._instances.add(self)
750
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in start(self, **kwargs)
954 self._started = asyncio.ensure_future(self._start(**kwargs))
955 else:
--> 956 sync(self.loop, self._start, **kwargs)
957
958 def __await__(self):
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
338 if error[0]:
339 typ, exc, tb = error[0]
--> 340 raise exc.with_traceback(tb)
341 else:
342 return result[0]
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/utils.py in f()
322 if callback_timeout is not None:
323 future = asyncio.wait_for(future, callback_timeout)
--> 324 result[0] = yield future
325 except Exception as exc:
326 error[0] = sys.exc_info()
/srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/gen.py in run(self)
760
761 try:
--> 762 value = future.result()
763 except Exception:
764 exc_info = sys.exc_info()
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in _start(self, timeout, **kwargs)
1044
1045 try:
-> 1046 await self._ensure_connected(timeout=timeout)
1047 except (OSError, ImportError):
1048 await self._close()
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in _ensure_connected(self, timeout)
1101
1102 try:
-> 1103 comm = await connect(
1104 self.scheduler.address, timeout=timeout, **self.connection_args
1105 )
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, handshake_overrides, **connection_args)
264
265 scheme, loc = parse_address(addr)
--> 266 backend = registry.get_backend(scheme)
267 connector = backend.get_connector()
268 comm = None
/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/comm/registry.py in get_backend(scheme)
79 )
80 if backend is None:
---> 81 raise ValueError(
82 "unknown address scheme %r (known schemes: %s)"
83 % (scheme, sorted(backends))
ValueError: unknown address scheme 'gateway' (known schemes: ['inproc', 'tcp', 'tls', 'ucx'])