It’s been a while, but I’ve spent time exploring the issues and now have a clear opinion on the best way to move forward.
In the GDAL data model, the GeoTransform defines an affine transformation—typically representing regular grids with optional rotation—that provides implicit coordinates for raster pixels.
GDAL can converts coordinates defined as an affine transformations into explicit coordinate arrays (NetCDF-style). However the transformation can lead to inaccuracy issues due to the finite precision of floating-point types. Moreover, it appears sometimes optimal to encode coordinates as an affine transformation (aka implicit coordinates), allowing libraries to compute indices using this function rather than iterating over a large array of values.
CF (and thus NetCDF) supports several grid types, always encoded as an array of the coordinates values (explicit coordinates):
- Regular (rectilinear) grid: coordinates have constant spacing along each axis.
- Rectilinear grid: axes are straight and aligned with coordinate axes but may have non-uniform spacing.
- Curvilinear grid: coordinates vary along both axes and form curved, structured grids defined by 2D coordinate arrays.
- Unstructured grid (more advanced case): uses 1D or 2D coordinate arrays with an explicit connectivity (see mesh topology) to describe irregular spatial relationships.
“Affine" transformations (including rotation, skew, etc.) are not explicitly part of the CF model; coordinate systems can imply affine transformations, but CF does not support general affine transforms in the grid definition. Note that CF conventions enable the implicit coordinates encoding through the Subsampling, but only for regular grids.
While it’s clear that the GeoZarr data model should include affine transformation encoding, there are multiple ways of encoding these implicit coordinates (derived from affine transformation) in Zarr:
- GeoTransform : affine transform (six coefficients) from GDAL data model
- ModelTiepointTag & ModelPixelScaleTag : defines pixel size and origin from GeoTiff
- ModelTransformationTag: defines a full 4×4 transformation matrix from GeoTiff
- pandas.RangeIndex : start, step, count from xarray/pandas
- Coordinate Subsamplng from CF (with possible extension)
- Using geographic coordinates and resolution from OGC API Coverage (OGC API - Coverages - Part 1: Core)
- GridFunctions from OGC GML
While each approach warrants deeper investigation, the pragmatic and efficient convention used by GDAL when converting GeoTIFF to NetCDF or Zarr is preferred: the CRS is translated into a standard CF grid_mapping variable, and the GeoTransform is included as an attribute of that variable.
To support multidimensional data (3D+), the CF conventions should be extended to allow a zero-dimensional coordinate variable with a grid_mapping and standard_name, (and probably an coordinate type attribute) enabling libraries such as GDAL or xarray to compute coordinates implicitly from the affine transformation, avoiding the need for explicit coordinate variable. Note that the dimension shapes are already provided in the variable model.
Example below:
dimensions:
x = 1000 ;
y = 1000 ;
variables:
float temperature(y, x) ;
temperature:units = "K" ;
temperature:standard_name = "sea_surface_temperature" ;
temperature:grid_mapping = "crs" ;
float x(x) ;
x:standard_name = "projection_x_coordinate" ;
x:units = "m" ;
x:grid_mapping = "crs" ;
x:coordinate_type = "GeoTransform" ;
float y(y) ;
y:standard_name = "projection_y_coordinate" ;
y:units = "m" ;
y:grid_mapping = "crs" ;
y:coordinate_type = "GeoTransform" ;
char crs ;
crs:grid_mapping_name = "transverse_mercator" ;
crs:GeoTransform = "500000 30 0 0 0 -30" ; // origin_x, pixel_width, rotation_x, origin_y, rotation_y, pixel_height
crs_wkt : PROJCS["WGS 84 / UTM zone 30N",GEOGCS["WGS 84",DATUM["World Geodetic System 1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-3],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
global attributes:
title = "SeaSat Sea Surface Temperature" ;
Conventions = "CF-1.8" ;
Edit: I removed the CRS expanded attributes. The expanded form allows to describe some projection types (sinusoidal, rotated_latitude_longitude, …) with only a few attributes. But to be discuss to make mandatory the GeoTransform (or equivalent) attribute and WKT in the GeoZarr spec of such implicit coordinates.
Looking forward your feedback.