Xarray trouble decoding NetCDF with compressed integers

jalder · October 30, 2024, 7:27pm

Hi all, I just came across a curious case where xarray is throwing this error on a compressed integer stored in a NetCDF4:

RuntimeWarning: overflow encountered in scalar absolute
vlim = max(abs(vmin - center), abs(vmax - center))

Here is the encoding as seen by xarray:
{‘dtype’: dtype(‘int16’),
‘zlib’: True,
‘szip’: False,
‘zstd’: False,
‘bzip2’: False,
‘blosc’: False,
‘shuffle’: False,
‘complevel’: 1,
‘fletcher32’: False,
‘contiguous’: False,
‘chunksizes’: (151, 79, 118),
‘preferred_chunks’: {‘time’: 151, ‘lat’: 79, ‘lon’: 118},
‘original_shape’: (151, 474, 944),
‘missing_value’: -999,
‘_FillValue’: -999}

Xarray doesn’t seem to be correctly decoding and applying the missing value -999. As a side note, Panoply does. Here the ncdump shown in Panoply:

short CDD(time=151, lat=474, lon=944);
:missing_value = -999S; // short
:_FillValue = -999S; // short
:long_name = “Consecutive Dry Days”;
:units = “days”;
:_ChunkSizes = 151U, 79U, 118U; // uint

xarray is loading the variable as an int64, but since isn’t catching the missing_value correctly the missing grid cells (ie ocean and lake bodies in these data) are loading as -9223372036854775808, which is messing up spatial averaging by not being represented as a NaN. My other compressed variables that have a scale_factor and add_offset are correctly being decoded into floats with missing_value NaNs.

Is the missing_value = -999S as a short forcing xarray to load the variable as an integer rather than a float with NaNs?

This is publicly released data, so I can’t change the source NetCDF files, but maybe I can write a xarray preprocess function to correct the typing.

Thanks for any suggestions.

ThomasMGeo · November 2, 2024, 2:16am

Can you share some example data?

jalder · November 4, 2024, 4:59pm

Sure, the data release is from here. There are many files, but the CMIP6-LOCA2_Thresholds_AllModels_grid_R3in.tar.gz file is the smallest and demonstrates the compressed integer issue.

import numpy as np
import xarray as xr

sample_file = '/Volumes/head4/published_projects/ScienceBase_Alder_2024_CMIP6-LOCA2_Thresholds/CMIP6-LOCA2_Thresholds_AllModels_grid/R3in/CMIP6-LOCA2_Thresholds_R3in_ACCESS-CM2.ssp245.r1i1p1f1_1950-2100_16thdeg_grid.nc'
ds = xr.open_dataset(sample_file, decode_times=False)

ds.R3in.isel(time=0).plot()

print(ds.R3in[0,0,0].data)

Which prints -9223372036854775808, being the xarray processed value for water bodies in these data. The missing_value is -999S, but xarray is still loading it as int64 rather than a double or a float that can support NaNs.

rabernat · November 5, 2024, 1:43am

Thanks for sharing! Hopefully someone can look into this. Sounds like an issue for the Xarray issue tracker.

Perhaps one unintentional takeway from this thread is that a .tar.gz file containing NetCDFs is not a particularly accessible or easy way to share and distribute data! I wanted to investigate this myself, but I gave up when I saw the amount of friction ahead of me to get to an actual Xarray dataset. (Lazy I know, but that’s probably typical.)

Topic		Replies	Views
Xarray to netcdf (ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all())	6	95	January 4, 2025
Xarray and compression options for large NetCDF files Data	8	3767	March 8, 2022
Reading a Larger than RAM NetCDF4 using Xarray Data zarr	7	145	June 24, 2025
Xarray DataArray resample missing values result in zeros Science	2	477	September 2, 2021
Xarray time-series, how to remove local outliers? Science	3	167	July 10, 2024

Xarray trouble decoding NetCDF with compressed integers

Related topics