AttributeError: 'EntryPoint' object has no attribute '_key', stops computation after a process has been running


I am testing computations on the Google Cloud Deployment (Server-options: 16 CPUs), and want to ask for help with interpreting the error that shows up on several of my delayed-dask-processes, so that I can understand more of the underlying process with dask on Pangeo, and can troubleshoot this further.

Running a “results = dask.compute(*list_of_delayed_functions)”-cell works, with the dask-processes fully visible in the dashboard. After a while the computation stops with the message AttributeError: 'EntryPoint' object has no attribute '_key.

Not familiar with this errormessage, and hope to receive directions or general advice.

  • Rerunning the same computation-cell makes it running for a while again, but stops at a different function, so it is not clear that one specific function is triggering this error.
  • The default notebook-environment is updated in the terminal with conda update -n notebook --all

I could not fit the errormessage in one photo, but have copied the full message preformatted below.


AttributeError                            Traceback (most recent call last)
Cell In [14], line 1
----> 1 results = dask.compute(*functions_at_a_gridpoint)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/dask/, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    597     keys.append(x.__dask_keys__())
    598     postcomputes.append(x.__dask_postcompute__())
--> 600 results = schedule(dsk, keys, **kwargs)
    601 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3055         should_rejoin = False
   3056 try:
-> 3057     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3058 finally:
   3059     for f in futures.values():

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in Client.gather(self, futures, errors, direct, asynchronous)
   2224 else:
   2225     local_worker = None
-> 2226 return self.sync(
   2227     self._gather,
   2228     futures,
   2229     errors=errors,
   2230     direct=direct,
   2231     local_worker=local_worker,
   2232     asynchronous=asynchronous,
   2233 )

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    337     return future
    338 else:
--> 339     return sync(
    340         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    341     )

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in sync(loop, func, callback_timeout, *args, **kwargs)
    404 if error:
    405     typ, exc, tb = error
--> 406     raise exc.with_traceback(tb)
    407 else:
    408     return result

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in sync.<locals>.f()
    377         future = asyncio.wait_for(future, callback_timeout)
    378     future = asyncio.ensure_future(future)
--> 379     result = yield future
    380 except Exception:
    381     error = sys.exc_info()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/, in
    759 exc_info = None
    761 try:
--> 762     value = future.result()
    763 except Exception:
    764     exc_info = sys.exc_info()

File /srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/, in Client._gather(self, futures, errors, direct, local_worker)
   2087         exc = CancelledError(key)
   2088     else:
-> 2089         raise exception.with_traceback(traceback)
   2090     raise exc
   2091 if errors == "skip":

Cell In [9], line 7, in optimize()
      1 def optimize(lat1,lon1):
      3     """
      4     Computes gridded coefficients of fit based on basisfunctions and dynamic height data within a local window
      5     """
----> 7     ds = xr.open_zarr(mapper, consolidated=True, chunks=None, decode_times=False)
      8     distance = xr.apply_ufunc(great_circle_distance, lat1, lon1, ds.latitude.load(), ds.longitude.load())
      9     ds_radius = ds.where(distance < window_size, drop=True)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in open_zarr()
    778 if kwargs:
    779     raise TypeError(
    780         "open_zarr() got unexpected keyword arguments " + ",".join(kwargs.keys())
    781     )
    783 backend_kwargs = {
    784     "synchronizer": synchronizer,
    785     "consolidated": consolidated,
    786     "overwrite_encoded_chunks": overwrite_encoded_chunks,
    787     "chunk_store": chunk_store,
    788     "storage_options": storage_options,
--> 789     "stacklevel": 4,
    790 }
    792 ds = open_dataset(
    793     filename_or_obj=store,
    794     group=group,
    805     use_cftime=use_cftime,
    806 )
    807 return ds

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in open_dataset()
    514 if engine is None:
    515     engine = plugins.guess_engine(filename_or_obj)
--> 517 backend = plugins.get_backend(engine)
    519 decoders = _resolve_decoders_kwargs(
    520     decode_cf,
    521     open_backend_dataset_parameters=backend.open_dataset_parameters,
    527     decode_coords=decode_coords,
    528 )
    530 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in get_backend()
    159 """Select open_dataset method based on current engine."""
    160 if isinstance(engine, str):
--> 161     engines = list_engines()
    162     if engine not in engines:
    163         raise ValueError(
    164             f"unrecognized engine {engine} must be one of: {list(engines)}"
    165         )

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in list_engines()
    104 else:
    105     entrypoints = entry_points().get("xarray.backends", ())
--> 106 return build_engines(entrypoints)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in build_engines()
     89     if backend.available:
     90         backend_entrypoints[backend_name] = backend
---> 91 entrypoints = remove_duplicates(entrypoints)
     92 external_backend_entrypoints = backends_dict_from_pkg(entrypoints)
     93 backend_entrypoints.update(external_backend_entrypoints)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/, in remove_duplicates()
     20 unique_entrypoints = []
     21 for name, matches in entrypoints_grouped:
     22     # remove equal entrypoints
---> 23     matches = list(set(matches))
     24     unique_entrypoints.append(matches[0])
     25     matches_len = len(matches)

File /srv/conda/envs/notebook/lib/python3.9/site-packages/setuptools/_vendor/importlib_metadata/, in __eq__()
    238 def __eq__(self, other):
--> 239     return self._key() == other._key()

AttributeError: 'EntryPoint' object has no attribute '_key'

This sounds like a version clash between your environments. Can you please check your version numbers, and specify how you set up your environments?

Hi, thanks,

I am new to using Pangeo Cloud, and use the existing environment ‘notebook’. I do not install other packages. The notebook-environment has the versions pasted below, but I see some version-numbers are slightly lower than in pangeo-docker-images/packages.txt. If this is the problem, what is the correct way to update the environments?

I use data stored in Google cloud storage, with

with open('token.json') as f:
    token = json.load(f)
gcs = gcsfs.GCSFileSystem(token=token)

mapper = gcs.get_mapper('path to .zarr-file')

and start computing the delayed functions with

def function(x,y,p):
    ds = xr.open_zarr(mapper, consolidated=True, chunks=None)
    return b

from dask_gateway import GatewayCluster
cluster = GatewayCluster()

from distributed import Client
client = Client(cluster)

from dask import delayed
import dask
function_delayed = delayed(function)

functions = []

for x, y in zip(xs, ys):

results = dask.compute(*functions)

Sometimes I see rerunning the cell results = dask.compute(*functions) works, as in the picture attached, where I ran the computations on the same 500 points twice, and it completed the second time after failing with the AttributeError first.

The environment on the dask workers needs to be the same as in your main session.

  • The default notebook-environment is updated in the terminal with conda update -n notebook --all

The key is that this updating of the notebook environment is not propagated to dask workers, have a close read of these docs Pangeo Cloud — Pangeo documentation

Hi @scottyhq and @martindurant, thanks. From what I read, the main session image, in jupyter hub, matches the image used on the dask cluster workers. I am guessing my mismatch comes from during troubleshooting of this error I updated packages in the main session environment. I previously thought that each time one logs into the deployments main jupyter-hub-session, the notebook environment is back to the default image, but I very possibly could be wrong. What would be the correct way for me to set the main session environment back to its default?


Currently when you log in to you’ll see a message like "2022-10-17T19:56:12Z [Normal] Pulling image “pangeo/pangeo-notebook:2022.09.21” (you can also check the default image from a terminal with echo JUPYTER_IMAGE)

You can check the list of packages in that default environment here pangeo-docker-images/packages.txt at 2022.09.21 · pangeo-data/pangeo-docker-images · GitHub

Only files under /home/jovyan persist across sessions, so if you have installed things or changed configuration files there with pip or conda there try deleting those files (for example /home/jovyan/.local/lib/PYTHON/site-packages/).


  • The notebook-environment has the same packages and version-numbers as in the list of default environment packages, using conda list. In addition it has singleton-decorator and jupyterlab-s3-browser, the only extra packages I can see.

  • I removed my repositories and removed all files under home/jovyan/. I log out and back in to the us-central gc-deployment to let the image be pulled, and upload the code again, and the error shows up on the first run.

  • It happens both when running on the GatewayCluster as displayed above, and when using the LocalCluster on the local cloud-server.

I wonder if you have advice for what I possibly am doing wrong and should check?

These warnings show during log-in. Could they help explain something about the error, that I am not picking up? But since the image is successfully pulled I am not sure this is relevant:

Event log
Server requested
2022-11-01T19:22:23Z [Warning] 0/39 nodes are available: 11 node(s) didn't match Pod's node affinity/selector, 25 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 3 Insufficient memory.
2022-11-01T19:22:30Z [Normal] pod triggered scale-up: [{ 3->4 (max: 100)}]
2022-11-01T19:22:57Z [Warning] 0/40 nodes are available: 1 node(s) had taint { }, that the pod didn't tolerate, 11 node(s) didn't match Pod's node affinity/selector, 25 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 3 Insufficient memory.
2022-11-01T19:23:07Z [Normal] Successfully assigned prod/jupyter-ofk123 to gke-pangeo-hubs-cluster-nb-huge-795c064d-d6ws
2022-11-01T19:23:11Z [Normal] Pulling image "busybox"
2022-11-01T19:23:12Z [Normal] Successfully pulled image "busybox" in 1.079886367s
2022-11-01T19:23:12Z [Normal] Created container volume-mount-ownership-fix
2022-11-01T19:23:12Z [Normal] Started container volume-mount-ownership-fix
2022-11-01T19:23:16Z [Normal] Pulling image "pangeo/pangeo-notebook:2022.09.21"
2022-11-01T19:24:15Z [Normal] Successfully pulled image "pangeo/pangeo-notebook:2022.09.21" in 59.343933498s
2022-11-01T19:24:15Z [Normal] Created container notebook
2022-11-01T19:24:15Z [Normal] Started container notebook

@ofk123 it seems possible we got a bit sidetracked with the versions mismatches. A way forward to troubleshoot your 'EntryPoint' object has no attribute _key message would be to share the data and code since you’re using the pangeo jupyterhub, that way others could possibly dig into it further.

  1. To share a zarr dataset you can use the object storage scratch space: Cloud Object Storage — Hub Service Guide

  2. To share a notebook two great options are:

Hi @scottyhq, thanks alot for the help in the office-hours and for the suggestions!
Here is a notebook showing the issue, (one where I use GatewayCluster is similar)
I am new to using pangeo and assume the error stems from either some kind of setting or configuration I have missed or accidentally changed, and/or the code is not set up correctly.
The packages and versionnumbers should be the same(at least from checking manually), and I have tried different alternatives of file-loading-code and cluster-type and some cluster-settings.
I will doublecheck the packages again (Edit: packages are equal, in addition it has singleton-decorator and jupyterlab-s3-browser). I appreciate any advice on how to possibly narrow down the issue more.

  • After the error-message occurs, running the cell dask.compute(run()) a second time always seems to work.
  • One thing that is hard about troubleshooting this is that sometimes the errormessage
    does not show up and it works perfectly. I test this with: running all cells in the notebook, changing the zarr-filename, then “Restart Kernel and Run All Cells”. The errormessage occurs after ~6 out of 10 of these runs.


Hi @ofk123 - I dug into your example notebook

I simplified the code further to be

import os
import xarray as xr
import dask

path_to_zarrfile = f'{SCRATCH_BUCKET}/087.zarr'
ds = xr.tutorial.open_dataset("rasm")  # load example data 
ds_chunked = ds.chunk({"time":1, "y":205, "x":275})
ds_chunked.to_zarr(path_to_zarrfile, consolidated=True)  # write data

def load(mapper):
    ds = xr.open_dataset(mapper, engine="zarr", consolidated=True)
    return None

result = dask.compute((load(path_to_zarrfile) for i in range(10000)))

and ran it on with up to 10000 items. Using a LocalCluster, I couldn’t reproduce the error.

However, I did get it on my first time on a GatewayCluster. Then it went away the second time I ran it, and never reappeared.

As a second data point, I tried this same exact thing on the LEAP JupyterHub which is similar but running a new image. It worked on the firs time.

This leads me to conclude that this is some sort of intermittent bug with the particular version of Dask that is installed on this hub. In this PR:

we are updating the version on your hub. Once that is done, I’m optimistic that your problem will go away.

Hi @rabernat Thanks for taking the time to look into this. I appreciate it. Look forward to try it out with the updated version of dask. I am optimistic too!