`save_cog_with_dask`: Cannot convert fill_value 999999 to dtype uint8

Hi there :wave:,

I am currently trying to write COGs via Dask with odc.geo.cog.save_cog_with_dask.
Most of the time, it seems fine, but with the same code, deserialisation issue happen sometimes:

  File "C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\numpy\ma\core.py", line 489, in _check_fill_value
    fill_value = np.asarray(fill_value, dtype=ndtype)
OverflowError: Python integer 999999 out of bounds for uint8

My code is:


I triple checked, everything is fine in the nodata area, for both rioxarray and odc accessors and encoding.0 in float shouldn’t be an issue as they are safely cast into uint8 data.

Does somebody ever encountered something like that?

Sorry it’s difficult to adress a minimum workable example as the rasters I’m using are very heavy (and commercial data).

NB: It’s a repost from `save_cog_with_dask`: Cannot convert fill_value 999999 to dtype uint8 - Dask Forum
I don’t think (or know) if it’s a bug, sio I rather start the discussion here before creating an issue there.

It appears that you are attempting to create an array of type uint8, which can only hold integer values in the range [0, 255], but attempting to insert the value 999999, which is far outside of that range. You either need to choose a value within that range, or use a dtype with a range that includes your large value.

This is where I don’t understand, where does this value 999999 comes from?
You can see on the screenshot that all the nodata values are set to 0.

To help us help you, please share the output of

data = xds.fillna(nodata).astype(dtype)).rio.set_nodata(nodata)

Hi @rabernat !

I have a DataArray, not a Dataset, so the info() function doesn’t seem to exist :thinking:

Here is the print of the DataArray:

<xarray.DataArray 'COG of EMSR773_AOI17_GRA_CONSOLIDATION_AERIAL_20241111_1300_ORTHO_cog' (
                                                                                           band: 3,
                                                                                           y: 41909,
                                                                                           x: 41362)> Size: 5GB
dask.array<astype, shape=(3, 41909, 41362), dtype=uint8, chunksize=(1, 2048, 2048), chunktype=numpy.ndarray>
  * band         (band) int32 12B 1 2 3
  * x            (x) float64 331kB -5.481e+04 -5.481e+04 ... -4.24e+04 -4.24e+04
  * y            (y) float64 335kB 4.754e+06 4.754e+06 ... 4.741e+06 4.741e+06
    spatial_ref  int32 4B 0
    quantile     float64 8B 0.02
    long_name:  ['RED', 'GREEN', 'BLUE']

This is where I don’t understand, where does this value 999999 comes from?

This is most likely coming from NumPy’s defaults for masked arrays - numpy.ma.default_fill_value — NumPy v2.1 Manual.

Based on the traceback mentioning get_data and deserialize I think the actual error is probably with reading the data from the original file rather than the writing process. What happens if you call a .load early in the workflow? Do you get the same error?

You mean do a load, recreate a dask array from the loaded array and resave it with save_cog_with_dask?

My suggestion was a debugging approach rather a final solution.

If there’s no existing shared solution, no one from these forums has encountered the exact same solution (thank you for asking!), and the data cannot be shared, I think you’ll probably need to do some digging to find out what is causing the issue. I was suggesting a mechanism for narrowing down where in the workflow it’s happening, by seeing if you get similar errors with:

# Test just opening the dataset
# Test filling missing data

Of course you may have already tried this all and only find the issue when passing the output from all the piped operations to save_cog_with_dask, so apologies if my suggestion was unhelpful.

What I can say is that is works like a charm when I save this without Dask, with rioxarray (after a compute), with to_raster(..., driver="COG", windowed=True).
It also seems to work with numpy < 2.

However, it fails with numpy >2.
I tested also with numpy 2.1.0, but the error is slightly different (same topic though)
TypeError: Cannot cast scalar from dtype('int64') to dtype('uint8') according to the rule 'same_kind'


File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\sertit\rasters.py”, line 258, in wrapper
raise ex
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\sertit\rasters.py”, line 254, in wrapper
out = function(any_raster_type, *args, **kwargs)
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\sertit\rasters.py”, line 1181, in write
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\dask\base.py”, line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\dask\base.py”, line 660, in compute
results = schedule(dsk, keys, **kwargs)
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\dask\array\reductions.py”, line 477, in chunk_max
return np.max(x, axis=axis, keepdims=keepdims)
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\numpy_core\fromnumeric.py”, line 3199, in max
return _wrapreduction(a, np.maximum, ‘max’, axis, None, out,
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\numpy_core\fromnumeric.py”, line 84, in _wrapreduction
return reduction(axis=axis, out=out, **passkwargs)
File “C:\Users\rbraun\Anaconda3\envs\eoprocesses\Lib\site-packages\numpy\ma\core.py”, line 6091, in max
np.copyto(result, result.fill_value, where=newmask)
TypeError: Cannot cast scalar from dtype(‘int64’) to dtype(‘uint8’) according to the rule ‘same_kind’

My idea is that it doesn’t fail with numpy < 2 because such newly illegal casting was possible before, but not anymore.
It is therefore more likely an issue and I’ll go to odc-geo creating one.

Thanks for the help :pray:

I created the issue at odc-geo and I worked on a minimal working example after all (you can find it there, in order to avoid creating multiple parallel threads about this):
'save_cog_with_dask' fails with numpy > 2 · Issue #189 · opendatacube/odc-geo · GitHub

