I’ve been trying to parallelize a high resolution (2 m DEM) Xarray workflow using Dask and run it on Pangeo. I’ve encountered a host of issues (including memory leakage crashing the cluster - changing my chunk size seemed to help this problem - to Cancellation errors). I think the cancellation error is ultimately a memory problem (the final exception is
asyncio.exceptions.CancelledError but the stack trace includes
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1124) and
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client_GatheringFuture exception was never retrieved as well (I’m happy to share the whole thing - it’s quite long and not my primary question at the moment)).
I encountered the above errors after setting up my cluster as:
from dask_gateway import GatewayCluster cluster = GatewayCluster() cluster.adapt(minimum=2, maximum=10) # or cluster.scale(n) to a fixed size. client = cluster.get_client() client
In trying to troubleshoot this I wanted to play with my cluster settings, but I wasn’t getting real-time info from the Dask dashboard in a separate browser window (I’m on a tired, 8 year old computer while my new one is out for repairs). I’ve seen the Pangeo+Dask integration demo-ed a few times and wanted to launch my cluster through JupyterLab instead so I could use those features. I’m able to start a Dask cluster (though it takes awhile - is this normal? I remember it being faster in workshops), but when I inject the client code:
from dask.distributed import Client client = Client("gateway://traefik-icesat2-prod-dask-gateway.icesat2-prod:80/icesat2-prod.c09995cf7b1340609256f1c8460b5e0b") client
into any notebook and try to run it I get a ValueError (full stack trace at the end of this post). I wanted to report this as unexpected behavior and also ask, how can I get the Dask Dashboard as a panel within my Jupyter Lab environment? I’m working on the stable Pangeo image.
Full stack trace:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-4-5de97e35304a> in <module> 1 from dask.distributed import Client 2 ----> 3 client = Client("gateway://traefik-icesat2-prod-dask-gateway.icesat2-prod:80/icesat2-prod.ef3014285d52450084bc4f1fbfde6f94") 4 client /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in __init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, connection_limit, **kwargs) 746 ext(self) 747 --> 748 self.start(timeout=timeout) 749 Client._instances.add(self) 750 /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in start(self, **kwargs) 954 self._started = asyncio.ensure_future(self._start(**kwargs)) 955 else: --> 956 sync(self.loop, self._start, **kwargs) 957 958 def __await__(self): /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 338 if error: 339 typ, exc, tb = error --> 340 raise exc.with_traceback(tb) 341 else: 342 return result /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/utils.py in f() 322 if callback_timeout is not None: 323 future = asyncio.wait_for(future, callback_timeout) --> 324 result = yield future 325 except Exception as exc: 326 error = sys.exc_info() /srv/conda/envs/notebook/lib/python3.8/site-packages/tornado/gen.py in run(self) 760 761 try: --> 762 value = future.result() 763 except Exception: 764 exc_info = sys.exc_info() /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in _start(self, timeout, **kwargs) 1044 1045 try: -> 1046 await self._ensure_connected(timeout=timeout) 1047 except (OSError, ImportError): 1048 await self._close() /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/client.py in _ensure_connected(self, timeout) 1101 1102 try: -> 1103 comm = await connect( 1104 self.scheduler.address, timeout=timeout, **self.connection_args 1105 ) /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, handshake_overrides, **connection_args) 264 265 scheme, loc = parse_address(addr) --> 266 backend = registry.get_backend(scheme) 267 connector = backend.get_connector() 268 comm = None /srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/comm/registry.py in get_backend(scheme) 79 ) 80 if backend is None: ---> 81 raise ValueError( 82 "unknown address scheme %r (known schemes: %s)" 83 % (scheme, sorted(backends)) ValueError: unknown address scheme 'gateway' (known schemes: ['inproc', 'tcp', 'tls', 'ucx'])