This is a repost from xarray
discussions as I am not getting any response there.
I am trying to save data of np.uint32
type to Zarr store:
import numpy as np
import os
import shutil
import xarray as xr
values = np.full((3,6), 50, dtype=np.uint32)
var_name = 'test'
ds = xr.Dataset(data_vars={var_name: xr.DataArray(data=values)})
print(f'dtype in xr.Dataset: {ds[var_name].dtype}')
# dtype of uint32 is not preserved when written to Zarr if _FillValue is provided: trying to force _FillValue to be of np.uint32 type does not help
encodings_dtype_fillvalue = {var_name: {'_FillValue': np.uint(0), 'dtype': np.uint32}}
encodings_dtype = {var_name: {'dtype': np.uint32}}
test_file = 'test.zarr'
for each_encoding in [encodings_dtype_fillvalue, encodings_dtype]:
print(f'Using encoding: {each_encoding}')
_ = ds.to_zarr(test_file, encoding=each_encoding, consolidated=True)
# Read data back in and verify the type
with xr.open_zarr(test_file, consolidated=True) as ds_from_zarr:
print(f'--->Type of data from zarr: {ds_from_zarr[var_name].dtype}')
# Cleanup
if os.path.exists(test_file):
shutil.rmtree(test_file)
It seems that the dtype
of the data is ignored on output when I provide _FillValue
setting for encoding thus “transforming” data to float64
type instead of requested uint32
type. Please see example above which produces output:
dtype in xr.Dataset: uint32
Using encoding: {'test': {'_FillValue': 0, 'dtype': <class 'numpy.uint32'>}}
--->Type of data from zarr: float64
Using encoding: {'test': {'dtype': <class 'numpy.uint32'>}}
--->Type of data from zarr: uint32
Does anybody have a suggestions on how to specify the _FillValue
without messing up the type of stored data in Zarr format? Or is it a bug in xarray
? Or am I not aware of something?
Any help or suggestion is much appreciated!
Thanks in advance.