Working with file level metadata in Zarr

shanecglass · April 7, 2021, 4:34pm

Hey all,

My team is examining different ways to expose the NetCDF file-level metadata. Our current means of doing so requires us to read the entire NetCDF file into memory or access the metadata, but this lacks scalability. What would a similar process look like if we accessed the metadata through a Zarr array instead of directly through the original NetCDF files?

rabernat · April 9, 2021, 12:04pm

Hi Shane, and welcome to the forum! This is a topic that we have obsessed over in Pangeo, so I’m happy to share what we have learned.

The goal of quickly peeking into netCDF files was indeed one of the main reasons that brought us to experiment with Zarr. You will find that what you want to do is trivial with Zarr + Xarray.

I would recommend just trying out converting your data to Zarr and playing around with it:

import xarray as xr
import zarr
ds_nc = xr.open_dataset('file.nc')
print(ds.attrs)  # display file-level metadata
print(ds.foo.attrs)  # display variable-level metadata
ds_nc.to_zarr('file.zarr', consolidated=True)  # could also be an s3 / gs path
 # consolidated metadata option makes reading faster
ds_zarr = xr.open_zarr('file.zarr', consolidated=True)
print(ds_zarr.attrs)  # it's all there
print(ds_xarr.foo.attrs)

For even faster access to the metadata, bypass xarray completely

zgroup = zarr.open_consolidated('file.zarr')
print(dict(zgroup.attrs))
print(dict(zgroup.foo.attrs))

You could also play around with different cloud-optimized formats (e.g. TileDB) or try out the just-released Zarr-enabled netCDF library

Topic		Replies	Views
Reading a Larger than RAM NetCDF4 using Xarray Data zarr	7	141	June 24, 2025
S3 - Zarr / NetCDF access times using s3fs Data	13	3514	April 19, 2023
Many netcdf to single zarr store using concurrent.futures Data	6	1419	March 29, 2022
Memory requirements tor converting a netcdf multifile dataset to zarr Data	3	839	May 18, 2022
Zarrdump: printing metadata of Zarr's from the command line Data	4	1142	November 15, 2022

Working with file level metadata in Zarr

Related topics