How does xarray feel about steps in a dimension

Michael_Sumner · November 6, 2024, 11:26am

I had naively thought that one couldn’t have duplicated or out of order steps in a (for example) time dimension, but it seems you certainly can do that.

Is there a perspectives or writings on this? It seems fair that a rectilinear coordinate should only increase, but I suppose there’s good reason to allow multiple solutions sometimes?

The case that got me first was this one:

gist.github.com

https://gist.github.com/mdsumner/c39ae008bd878ee654c2f72299616e76

AODN_zarr.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# aodn ghrsst"
      ],
      "id": "0de7ccd3-b5a6-4e46-bdf5-b855de0da01b"
    },

This file has been truncated. show original

The situation I’m considering is the OISST time series which has “preliminary” files, that end up mixed in with “final” files, and a grouped order and distinct is enough to catch a nice monotonic series, but it seems xarray doesn’t care and you can load them all in. Is there a monotonic-strict mode?

There sure are netcdfs out there with duplicated or out of order “x” coordinates, but I always considered that those were properly broken.

I think about files in netcdf like I think about 1D track data, it doesn’t make sense if there are duplicated or out of order time steps - but maybe there’s a good reason to allow that in the general case of mixed or grouped dimensions?

this topic suggests to me that it’s “user-beware” at load time, and only enforced in workflows downstream: AODN_zarr.ipynb · GitHub

(I’m certainly going to normalized my file sets as I have elsewhere, but this seems like a gap atm)

dcherian · November 6, 2024, 3:38pm

Xarray does not care in general. Sortedness is mostly only useful for plotting and indexing.

You can use assert ds.indexes["time"].is_monotonic_increasing to assert properties that you want, for example.

Michael_Sumner · November 6, 2024, 7:32pm

ok cool, thanks! Same with duplication? Sortedness and avoiding duplication is certainly useful for validation and preventing error propagation (and xarray can’t do everything, I can see it being useful for lining up datasets for downstream use). I don’t want to make Zarrs that mirror the old mess in netcdf, so if anyone has pointers to how they avoid this I’m interested.

I’ll do normalizing upstream.

monotonic increasing doesn’t help with duplicates (as documented), this from a different dataset:

ds.time[16044:16046].values
array(['2024-10-20T12:00:00.000000000', '2024-10-20T12:00:00.000000000'],
      dtype='datetime64[ns]')
ds.indexes["time"].is_monotonic_increasing
True

TomAugspurger · November 6, 2024, 7:49pm

index.is_unique and index.is_monotonic_increasing will give you strict monotonicity.

Pandas has (experimental) support for disallowing duplicate labels on indexes attached to a DataFrame or Series: Duplicate Labels — pandas 2.2.3 documentation. I suspect that’s not (yet) exposed xarray DataArrays and Datasets.

Michael_Sumner · November 6, 2024, 8:05pm

Cool thanks definitely want something for that upfront.

Also just now I see open_mfdataset does de-duplicate and order netcdfs at read time (so it is a reasonable expectation and I need to make sure my .concat input gets cleaned up first).

Michael_Sumner · November 6, 2024, 8:10pm

and I see .concat() probably has handlers for this … thanks! I needed a push.

TomNicholas · November 6, 2024, 8:50pm

I think the behaviour you’re referring to is specific to the combine='by_coords' option FYI, because that internally checks the indexes are monotonic.

dcherian · November 6, 2024, 9:00pm

Xarray does have xarray.DataArray.drop_duplicates

Topic		Replies	Views
Help with xarrays combine_by_coordinate: not have monotonic global indexes Data	4	1208	October 19, 2023
Xarray for raster data (DEMs) with inconsistent spatial extent Data	10	2995	January 6, 2024
Reshape time series into 2 dimension matrix in xarray Science	4	1179	October 26, 2022
How to efficiently overwrite existing zarr archive with reordered time axis? (Updated Question) Data	12	1648	September 19, 2024
Is it possible to load a netCDF with xarray, and change one of the coordinates? Data	2	454	March 3, 2023

How does xarray feel about steps in a dimension

Related topics