Struglin with Zarr files reading in Azure Blob Storage using Databricks

INAVAS · November 14, 2022, 12:28pm

After experiencing very long times in the process of reading and slicing monthly grib files of my own grib database, I have investigated in the internet and found that Zarr might be my solution.

I have managed to generate a zarr file but i am strugling to read it. I get the error KeyError: '.zmetadata" even tough the file is in there. Please see picture attached

I would appreciate any clue on why the libary cannot find the .zmetadata.

Code:
import zarr
from azure.storage.blob import BlobServiceClient

store = zarr.ABSStore( prefix=‘zarr_example’,client=client, blob_service_kwargs={‘is_emulated’: False})

compressor = zarr.Blosc(cname=‘zstd’, clevel=3)
encoding = {vname: {‘compressor’: compressor} for vname in ds.data_vars}
ds.to_zarr(store=store, encoding=encoding, consolidated=True)

zarr_ds = xr.open_zarr(store=‘zarr_example’, consolidated=True)
zarr_ds

Error:

KeyError Traceback (most recent call last)
in
----> 1 zarr_ds = xr.open_zarr(store=‘zarr_example’, consolidated=True)

```
 2 zarr_ds*
```

*/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, *kwargs)

785 }*
786 *
→ 787 ds = open_dataset(
788 filename_or_obj=store,*
789 group=group,*

*/databricks/python/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, *kwargs)

537 *
538 overwrite_encoded_chunks = kwargs.pop(“overwrite_encoded_chunks”, None)*
→ 539 backend_ds = backend.open_dataset(
540 filename_or_obj,*
541 drop_variables=drop_variables,*

/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)

846 *
847 filename_or_obj = _normalize_path(filename_or_obj)*
→ 848 store = ZarrStore.open_group(
849 filename_or_obj,*
850 group=group,*

/databricks/python/lib/python3.8/site-packages/xarray/backends/zarr.py in open_group(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel)

398 elif consolidated:*
399 # TODO: an option to pass the metadata_key keyword*
*–> 400 zarr_group = zarr.open_consolidated(store, *open_kwargs)
401 else:*
402 zarr_group = zarr.open_group(store, *open_kwargs)

*/databricks/python/lib/python3.8/site-packages/zarr/convenience.py in open_consolidated(store, metadata_key, mode, *kwargs)

1298 *
1299 # setup metadata store*
→ 1300 meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
1301 *
1302 # pass through*

/databricks/python/lib/python3.8/site-packages/zarr/storage.py in init(self, store, metadata_key)

2859 *
2860 # retrieve consolidated metadata*
→ 2861 meta = json_loads(self.store[metadata_key])
2862 *
2863 # check format of consolidated metadata*

/databricks/python/lib/python3.8/site-packages/zarr/storage.py in getitem(self, key)

1072 return self._fromfile(filepath)*
1073 else:*
→ 1074 raise KeyError(key)
1075 *
1076 def setitem(self, key, value):*

KeyError: ‘.zmetadata’

TomAugspurger · November 14, 2022, 12:39pm

I find it helpful to ensure that you can access the file using the blob storage client directly, to ensure that you get the storage account, container name, and prefix sorted out properly.

Using the zarr store at “https://daymeteuwest.blob.core.windows.net/daymet-zarr/annual/hi.zarr” as an example, the storage account is daymeteuwest, the container is daymet-zarr, and the prefix is annual/hi.zarr.

In [12]: import zarr, azure.storage.blob, xarray as xr

In [13]: client = azure.storage.blob.ContainerClient("https://daymeteuwest.blob.core.windows.net", "daymet-zarr")

In [14]: assert client.get_blob_client("annual/hi.zarr/.zmetadata").exists()

In [15]: store = zarr.ABSStore(client=client, prefix="annual/hi.zarr")

In [16]: ds = xr.open_dataset(store, engine="zarr")

INAVAS · November 14, 2022, 12:49pm

Thank you very much for the answer.

I got True in get_blob_client…

client.get_blob_client(“zarr_example/.zmetadata”).exists()
Out[24]: True

Topic		Replies	Views
Working with file level metadata in Zarr Data	1	557	April 9, 2021
Error in Zarr metadata while loading CMIP6 data Data zarr	1	83	August 23, 2024
Welcome, I need some support for the design of a forecast archive with Zarr Data	10	1186	April 23, 2022
How to grab data from Amazon? Science	4	646	June 10, 2021
Best practice reading zarr from s3 Cloud	8	4743	July 28, 2022

Struglin with Zarr files reading in Azure Blob Storage using Databricks

Related topics