Pangeo Showcase: "Geospatial reprojection in Python 2024 - what's available and what's next"

Title: “Geospatial reprojection in Python 2024 - what’s available and what’s next”
Speaker: Max Jones (ORCID: 0000-0003-0180-8928 )
When: Wednesday, September 25, 4PM EDT
Where: Launch Meeting - Zoom
Abstract:
This Pangeo showcase will cover a work-in-progress guidebook on existing warp resampling / reprojection methods in Python, along with some memory and statistical wall-time profiling results. Following a short presentation, we will discuss tips and tricks along with pain points for geospatial reprojection. Lastly, we’ll brainstorm what’s required to move the ecosystem forward.

Inspirations:

Agenda:

  • 10 - 15 minutes - Showcase presentation
  • 10 - 30 minutes - Discussion
  • 15 - 30 minutes - Community check-in
14 Likes

Excited to brainstorm paths forward in a community meeting again!

2 Likes

Cool, I would have liked to attend but I’m in the wrong timezone (CET). I assume a recording will be available afterwards?

1 Like

Yes indeed, it will be recorded!

3 Likes

Hi @maxrjones, thanks a lot to you and everyone for this impressive showcase before bed time!

I was wondering if you could share your diagram of the regrinding/reprojection ecosystem?

Thas was awesome @maxrjones , great overview!

I’m gutted to have missed it, I was really looking forward to join the dicussion. Is there any key takeaways from that part of the session?

Also - putting the link to the gh repo here for visibility (although it may be better to add at the top of the post)

In today’s showcase the question came up whether GDAL can provide an index and weights for a warp task, and I had asked this on the mailing list earlier this year. Even Rouault provided a summary of the potential for that and where to look in the source code:

https://lists.osgeo.org/pipermail/gdal-dev/2024-April/058906.html

I’ll be pursuing this, I had similar interest for this in exactextract and did enough to actually get hold of the index there, which I will dig up or illustrate if it’s currently possible. I’ve also worked on an entirely abstract approximate polygon rasterization here - rasterization and extracting values are obviously complementary - something that I haven’t found much audience for elsewhere but I think here will find some interest. :slight_smile:

2 Likes

I also think that some of the geobox->geobox transformation that is in odc.geo could be used for some of this abstraction, it implements quiet a bit of the warper logic for bespoke use. I’ll try to follow up soon with more actual examples (and less just-talking).

1 Like

thanks! here you go - Excalidraw+ Resampling Workflows diagram.

If you’d like to mention corrections or suggestions, please either open an issue in this GitHub repo or join this discussion thread in the Pangeo discord. There’s already been a couple minor corrections since yesterday.

1 Like

there is an improving status of high level documentation about the GDAL warper: Doc: Add high-level description of gdalwarp procedure by dbaston · Pull Request #10864 · OSGeo/gdal · GitHub

published online here: gdalwarp — GDAL documentation

I forgot to mention TempestRemap, which does let you generate weights!: GitHub - ClimateGlobalChange/tempestremap: TempestRemap: Remapping software for climate applications

and PyInterp was mentioned in the Zoom chat.

Thanks for the excellent overview and performance testing. I checked on PyResample and probably left the wrong impression during the meeting. The application does split the resampling task into two parts, but the “weights” are not really included in the data returned in the first - just the mapping (indexing) information. I suspect the intention was to split the time consuming processing into the first step, with calculating weights not really being the processing that takes a lot of time, thus could be included in the second step. There was no intention of allowing any “tweaking” of the weights, and I’m not sure any of the algorithms or interpolation methods really lend themselves to this this adjustment.

I too started an assessment of reprojection options last year, but focusing on the different algorithms and interpolation techniques, and on the different quality factors. One of points not addressed above is the risk of sampling errors - leading to aliasing and moiré patterns in the projected outputs. To minimize these, several of the interpolation techniques include “smoothing” functions - e.g., the Elliptical Weighted Average technique in Pyresample, for swath to grid projection, and I believe GDAL’s lanczos (gdalwarp).

I had summarized the quality factors as:

  • Grid spatial distortion - min/max or average - the stretch of the output projection across the grid.
  • Changes to the “Value-Space”
    • “Categorical”/Integer values - e.g. Nearest Neighbor approach can preserve, while most others will “average” or interpolate values.
    • Preserve Min/Max values - e.g. BiLinear will preserve, while Cubic can lead to values beyond the range of input values
    • 2nd order derivatives - “Conservative” algorithms - e.g., can preserve power, flux, total-energy, regional rainfall.
  • Smoothing, or absence of - introducing sampling errors, aliasing or moiré patterens.