Scale_factor and add_offset xarray.to_netcdf


I am trying to write an xarray dataset to netcdf using xr.to_netdf. I am calculating the scale_factor and add_offset values for each individual data variable in the dataset using the code below. I like the data to be written as int16. mapMax is the maximum map value that is calculated for each data variable in the dataset. Using the code below results in some weird values (something like -0.0027…) for some cells (but not all) that have zero as the maximum map value when trying to view the written netcdf in panoply or qgis. The maps are correctly written if I do not pack dataset using scale_factor and add_offset. Does anyone know how this is possible? Could this be a bug in the .to_netcdf function?

encoding = {}
for data_var in ds.data_vars:

   def compute_scale_and_offset(minValue, maxValue, n):
        Computes the scale_factor and offset for the dataset using a minValue and maxValue, and int n
        # stretch/compress data to the available packed range
        scale_factor = (maxValue - minValue) / (2 ** n - 1)
        # translate the range to be symmetric about zero
        add_offset = minValue + 2 ** (n - 1) * scale_factor
        return scale_factor, add_offset

   # Missing value is set to -9999 and is lowest value in each data array
   mv = -9999
   ds = xr.where(ds.isnull(), mv, ds)
   scale, offset = compute_scale_and_offset(mv, mapMax, 16)
   # Calculate packed value for mv
   fvalue = (mv - offset) / scale
   encoding[data_var] = {'_FillValue': fvalue, 'scale_factor': scale, 'add_offset': offset, 'dtype': 'int16'}

ds.to_netcdf(os.path.join(s.outPath, ‘’), encoding=encoding, engine=‘h5netcdf’)

This is not an answer to your questions (sorry), but this issue is possibly related: Writing and reopening introduces bad values · Issue #5739 · pydata/xarray · GitHub

1 Like

Hi @WilcoT - thanks for asking this question on the Pangeo forum. My impression is that you would get a better response by opening an issue on the Xarray issue tracker