I’ve been spinning up a new project and have been trying to use xarray, but I’m remembering why I abandoned it years ago the first time I tried to use it. It’s not intuitive to teach yourself (e.g. a DataArray automatically has a variable included that’s not listed in the output, but a DataSet lists the variables by name, even if there’s only one), and the examples/resources are thin to nonexistent if your data don’t fit into nice, neat consistent grids (whether that’s because they have different spatial resolutions or different extents, projections aside here!). I’ve found a few docs/examples that mention reprojection (including rioxarray), but none actually provide much information on what’s happening/how.
Ultimately, I’m hoping to read in and stack a series of DEMs (digital elevation models). Each one has, a priori, an unknown extent (by me - so the file or metadata would have to be read/created to determine it), and though they are all 2 m resolution, the pixel centerpoints don’t necessarily match. When I bring the geotiffs into xarray (xarray.open_rasterio('path/to/DEM')
), they have x, y and band coordinates/dimensions. Ultimately, I was hoping to have an xarray with time, x, and y dimensions and then a few data variables (for now just elevation, ultimately a few calculation outputs based on those elevations). The clearest workflow I can envision is to bring each geotiff in, manually add the time coordinate/dimension (and value), remove the band coordinate/dimension, and add it to a framework dataset I’ve set up. The outstanding issues are:
- This feels highly inefficient, to have to modify each dataarray so I can combine them
- If I understand xarray correctly, I should be able to have x and y as coordinates/dimensions, but each time would have a unique set of x and y values, so x and y would depend on time. I’d have to impose this structure on each geotiff I import in order to combine multiple dataarrays
- In order to have a single set of x and y values (rather than one for each time step), I will need to resample both my x and y grids AND the raster itself to match the unified grid. Will xarray or rioxarray handle this, and how do I tell it what to do (e.g. how I want the data resampled)?
- What functions should I be using for these operations to make sure that I’m not making a meaningless dataset?
- Should I even be using xarray in this way? It’d be easy enough to bring in each raster by itself, do some computations, and get the results. I’d ultimately like to have a stacked/compiled dataset, but this could be done independently from the processed DEMs later.
I reached out to @dshean and @shashank on Slack, and they suggested that this isn’t the first time they’ve had this conversation. However, I’ve found little public discussion about it and what might be happening at the community/xarray level towards solutions. I’m hoping that the Pangeo community can shed some insight, because I’d really like to be able to leverage xarray!