This is part of a follow-on conversation from the “What’s next for Pangeo” discussion that took place 2023-12-06. It is part of the overarching software topic.
I don’t know this topic well, so I’m mostly copying in things that others have said. I hope that they can carry the conversation
There is still no standard way to associate geospatial coordinate information (CRS + coordinates) with Xarray data. The bridge between the geospatial raster world (e.g. geotiff, rasterio) and Xarray is fragile
GeoXarray was supposed to resolve these problems
Is it being developed at all? Why not?
I’d love to see a consolidated roadmap in xarray that describes the path forward for CRS flexible indexing and CRS serialization and how these can be propagated to storage formats. Right now it feels like there are several independent efforts spread across several issues/repos that makes it difficult to grok the path forward.
There could be a grand synthesis of model data structures and “raster geospatial” ones, currently the xarray-everything model misses some key ideas I think and, Zarr doesn’t appeal to me as a way to store data (no overviews). I think the model-netcdf structure vs GeoTIFF alike needs a serious rethink
- For CRS make sure to look at what Arrow is doing for vector data, stuff already in GDAL but scattered in community understanding and visibility