Struglin with Zarr files reading in Azure Blob Storage using Databricks

After experiencing very long times in the process of reading and slicing monthly grib files of my own grib database, I have investigated in the internet and found that Zarr might be my solution.

I have managed to generate a zarr file but i am strugling to read it. I get the error KeyError: '.zmetadata" even tough the file is in there. Please see picture attached

I would appreciate any clue on why the libary cannot find the .zmetadata.

Code:
import zarr
from azure.storage.blob import BlobServiceClient

store = zarr.ABSStore( prefix=‘zarr_example’,client=client, blob_service_kwargs={‘is_emulated’: False})

compressor = zarr.Blosc(cname=‘zstd’, clevel=3)
encoding = {vname: {‘compressor’: compressor} for vname in ds.data_vars}
ds.to_zarr(store=store, encoding=encoding, consolidated=True)

zarr_ds = xr.open_zarr(store=‘zarr_example’, consolidated=True)
zarr_ds

Error:

KeyError Traceback (most recent call last)
in
----> 1 zarr_ds = xr.open_zarr(store=‘zarr_example’, consolidated=True)

  •  2 zarr_ds*
    

*/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, *kwargs)

  • 785 }*
  • 786 *
    → 787 ds = open_dataset(
  • 788 filename_or_obj=store,*
  • 789 group=group,*

*/databricks/python/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, *kwargs)

  • 537 *
  • 538 overwrite_encoded_chunks = kwargs.pop(“overwrite_encoded_chunks”, None)*
    → 539 backend_ds = backend.open_dataset(
  • 540 filename_or_obj,*
  • 541 drop_variables=drop_variables,*

/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)

  • 846 *
  • 847 filename_or_obj = _normalize_path(filename_or_obj)*
    → 848 store = ZarrStore.open_group(
  • 849 filename_or_obj,*
  • 850 group=group,*

/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel)

  • 398 elif consolidated:*
  • 399 # TODO: an option to pass the metadata_key keyword*
    *–> 400 zarr_group = zarr.open_consolidated(store, *open_kwargs)
  • 401 else:*
  • 402 zarr_group = zarr.open_group(store, *open_kwargs)

*/databricks/python/lib/python3.8/site-packages/zarr/convenience.py in open_consolidated(store, metadata_key, mode, *kwargs)

  • 1298 *
  • 1299 # setup metadata store*
    → 1300 meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
  • 1301 *
  • 1302 # pass through*

/databricks/python/lib/python3.8/site-packages/zarr/storage.py in init(self, store, metadata_key)

  • 2859 *
  • 2860 # retrieve consolidated metadata*
    → 2861 meta = json_loads(self.store[metadata_key])
  • 2862 *
  • 2863 # check format of consolidated metadata*

/databricks/python/lib/python3.8/site-packages/zarr/storage.py in getitem(self, key)

  • 1072 return self._fromfile(filepath)*
  • 1073 else:*
    → 1074 raise KeyError(key)
  • 1075 *
  • 1076 def setitem(self, key, value):*

KeyError: ‘.zmetadata’

1 Like

I find it helpful to ensure that you can access the file using the blob storage client directly, to ensure that you get the storage account, container name, and prefix sorted out properly.

Using the zarr store at “https://daymeteuwest.blob.core.windows.net/daymet-zarr/annual/hi.zarr” as an example, the storage account is daymeteuwest, the container is daymet-zarr, and the prefix is annual/hi.zarr.

In [12]: import zarr, azure.storage.blob, xarray as xr

In [13]: client = azure.storage.blob.ContainerClient("https://daymeteuwest.blob.core.windows.net", "daymet-zarr")

In [14]: assert client.get_blob_client("annual/hi.zarr/.zmetadata").exists()

In [15]: store = zarr.ABSStore(client=client, prefix="annual/hi.zarr")

In [16]: ds = xr.open_dataset(store, engine="zarr")
2 Likes

Thank you very much for the answer.

I got True in get_blob_client…

client.get_blob_client(“zarr_example/.zmetadata”).exists()
Out[24]: True

1 Like