XARRAY: ValueError: cannot reindex or align along dimension ‘x’ because the index has duplicate values

I have around 5000 geotiff files each of (25000,25000) with data value float32.
I wanted to calculate .argmax() across 5000 arrays.
So, first tried by merging the data arrays.

import xarray, rioxarray
rasters = [rioxarray.open_rasterio(f) for f in files]
merged = xarray.concat(rasters,dim=“band”)

I m struggled with the following error.

Traceback (most recent call last):
File “”, line 42, in
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/concat.py”, line 192, in concat
objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/concat.py”, line 527, in _dataarray_concat
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/concat.py”, line 384, in _dataset_concat
align(*datasets, join=join, copy=False, exclude=[dim], fill_value=fill_value)
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/alignment.py”, line 354, in align
copy=copy, fill_value=fill_value, indexers=valid_indexers
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/dataset.py”, line 2630, in reindex
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/dataset.py”, line 2661, in _reindex
File “/home/laura/.local/lib/python3.6/site-packages/xarray/core/alignment.py”, line 567, in reindex_variables
“index has duplicate values” % dim
ValueError: cannot reindex or align along dimension ‘x’ because the index has duplicate values

I tried by putting compat = override or no_conflicts. But, I got same error.

Could someone help me ?

What kind of joining / alignment behavior do you expect here? Are all the rasters supposed to have the exact same footprint, and if not, how do you want to align things? By doing a concat, you’re telling xarray you want a single array, and it has to figure out how to do things (maybe you want join="override"?)

It’s also worth asking if you even need to align things in the first place. Is doing each argmax independently an option, so that you end up with 5,000 argmaxes? And then you can take the argmax of that array of argmaxes to get the overall argmax? If you get the alignment figured out and are using Dask to parallelize things, that’s essentially what it would do for you.

1 Like

Thanks for your kind response. I tried with join=“override” but got the same error.
I also added compat=no_conflicts, but still the same error.
All the rasters have same footprint, and used same x and y dimensions.
It was supposed to align over x and y dimensions and the output into the following shape (5000, 25000, 25000) where 5000 is no of bands. So I will calculate argmax across axis 0.
All input files were read as (1, 25000, 25000) using rioxarray.open_rasterio().

But when using concat, got the error. As all of them are read as same shape of (1, 25000, 25000), it is not necessary to align, just needed to concat.

Are you able to share one or two of the files?

I will try to isolate problematic file and share.