Example which highlights the limitations of NetCDF-style coordinates for large geospatial rasters

In terms of consolidating the package landscape, Ryan said something similar here:

https://discourse.pangeo.io/t/comparing-odc-stac-load-and-stackstac-for-raster-composite-workflow/

and, I expect many opendatacube folks would describe the stack like

xarray → odc → rasterio → gdal

As far as I understand it, rioxarray didn’t exist when odc was being born, and after initial confusion (as an R user in this space) I realized why they both existed.

The other main thing is that rasterio is not GDAL, and GDAL ships with entirely sufficient bindings for Python of its own in osgeo.gdal/ogr/osr (and others). Similar problematic overlap exists in R, and R does not yet have bindings to the library that are development-ready (each significant project has built its own bindings as needed, and the community is essentially forked at least in two and arguably more camps - though there is a hopeful contender for real API access via the {gdalraster} package). In my Python journey I haven’t found use for rasterio, I get what I need from the osgeo bundle, this is after a journey in R from high level “GIS like” packages down to the actual core library, and that has significant benefits in efficiency and control rather than going through a higher level layer.

To talk about a couple of specifics of this, in terms of generating coordinates from a regular grid specification, rasterio is overkill, and GDAL is overkill, it’s very simple arithmetic to generate these and there’s a whole family of grid-logic tools that I would say belong in this space as well. Affine would be a sensible place to start, and even this has overlap with GeoBox so maybe that’s something to consider as well: GitHub - rasterio/affine: Affine transformation matrices

Also there is the multi-dim model in GDAL which would be a really excellent project for xarray to leverage, afaik rasterio doesn’t do anything in this space (and fair enough, its scope is wide enough). Multi-dimensional support · Issue #1759 · rasterio/rasterio · GitHub

(Also please be assured that I’m definitely not criticizing any packages or choices made here, I know that at every step difficult choices and hard work was done, I’m just riffing on the lovely vibe I see generally in Python to scan the landscape and consolidate where possible. I’m super impressed by so much in the Python and xarray world, and I think it’s very valuable to see what choices have been made in R and Python and other langs). .

Abstract tools for grid logic are really powerful and need to be championed way more IMO (I’m not enough across the space in Python yet, maybe it exists). Cell indexes, treating pixel position abstractly by index or row/column and helper functions for that are simple, functions-of dimension,extent (or dimension,transform for more general cases. I have my own versions in R and it was very valuable to separate that logic out of the geospatial package in R so it’s not bound to a format or data at all). In my dreams … I would separate out this code in GDAL itself, and have it as a nice standalone library that GDAL perhaps included.

1 Like