Example which highlights the limitations of NetCDF-style coordinates for large geospatial rasters

This is what I think would be helpful in xarray to support GeoBox use-case.

Say you have 4d array with dimensions TYXB, an RGBA datacude of several timestamps. The YX, dimensions 1,2, form a pixel plane for which mapping from pixel coordinate to world coordinate exists, possibly defined with a linear mapping (Affine matrix [1]), or maybe GCPs [2], or RPCs[3], or reduced resolution location arrays.

The only way to handle this data in a generic case (anything but axis aligned imagery) is to construct two 2d coordinate arrays, this adds 16 bytes per pixel, which is a huge overhead for what is typically 2-byte data to begin with.

Instead, if I could tell xarray that:

  1. Dimensions 1,2 are “linked”
  2. Given index into linked dimensions Tuple[int,...] one can compute coordinates Tuple[float, ...] by using provided Callable.

A lot of data providers ensure that mapping from pixel plane to world plane can be done independently for X,Y coordinates, and pixels are square, it’s easier to reason about data that way. But that doesn’t work for data that is distributed without post-processing, data that just returns raw image as seen by the sensor with a bunch of pixel locations with known spatial coordinated (GCPs).

But even for data processing tasks, picking pixel plane that is not aligned with the world plane can be helpful, see [4] for an example where rendering into a rotated plane significantly reduces memory requirements for the result. Too bad these are too hard to work with using current assumptions within the ecosystem of tools.

References:

[1] GeoBox Model — odc-geo 0.4.3 documentation
[2] Raster Data Model — GDAL documentation
[3] RFC 22: RPC Georeferencing — GDAL documentation
[4] Generating Rotated Images to Save Space · opendatacube/odc-stac Wiki · GitHub