Discrete Global Grid Systems (DGGS) use with Pangeo

Dear all,

there has been increasing interest in working with DGGS, e.g. local to global spatial statistics and analysis, and connecting the DGGS and Pangeo communities. For example, Uber’s H3 is one type of DGGS that has been very popular recently. I’d like to use the chance to start communications around how DGGS are useful and how they connect to and bridge traditional cartographic GIS (flat earth, Cartesian gridding). The core concept of DGGSs is to discretize the Earth into a globally continuous grid of evenly spaced and sized uniquely identifiable cells, with native support of hierarchical and neighborhood grid traversal.

Currently, decently usable DGGS libs in Python:

  • Uber H3 and H3 Pyhon bindings: base c-lib and Python api, comes on PYPI and CONDA
  • rHEALPix: pure python, but eventually could use a faster implementation, on PYPI,
  • DGGRID: C++ based commandline tool, need to be self-compiled, ideally package on CONDA, I made it somehow work on the Julia build system, but I struggle with conda recipes, pls help
  • there are some wrappers for DGGRID, incl dggrid4py (humbly by myself), should be on PYPI, would be nice on CONDA, ideally with DGGRID as a dependency
  • Google S2 and S2 on Github: I saw it might be on CONDA S2 but haven’t checked the Python bindings
  • OpenEaggr DGGS, single dump project on GitHub often compared in DGGS, not continued though, has good triangle index system

I have literally a bunch of things half-ready, like some Jupyter notebooks for workshops, scrambled together libs and scripts, with Dask and without, experiments storing in Zarr, plots with GeoViews etc, which I could rework into some nice demo notebooks, tutorials and for the Pangeo gallery.

I would be very happy if we could find a place where I/we put some more condensed info on DGGS. There is a lot of experience with those different systems - they all have their reasons to exist, by the way, none are fully superior to another. I could also start writing a few blog posts and create gallery materials. I’d just need some directions to get going so people can pick it up and make it better than I am able to do :slight_smile:

5 Likes

By the way, although H3 is immensely popular, H3 (and S2) for that matter are more to be understood as point indexing and aggregation systems, less than true “raster-type” alternatives. rHEALPix on the other side, and ISEA-based DGGS (Snyder equal area projection), are certainly useful for gridded representation of continuous phenomena (like we use in GeoTIFF, NetCDF etc).

For example: H3 is an aperture 7 hexagonal grid with Gnomonic projection (NOT equal area), sports Numpy array support and has ports in like all the programming languages, that means H3 cell identifiers are understood across platforms, e.g. compute data in Python H3 and visualize in pure JavaScript in Kepler.gl or Deck.gl, without copying geometry (cell ID “contains” location and resolution/cell size). But then H3 cells are only “approximately contained within a parent cell” (for all hexagonal grids anyway), so hierarchical traversal is not fully “congruent” as would be with squares/rhombus or triangles.

There are a few core missions to suggest:

  • make stuff easily installable
  • make data conversion/binning best practices in code, “classic geo” (vector/raster) to DGGS and DGGS to geo
  • storage of DGGS-indexed data (think table, index is cell ID plus data values at that “coordinate”) in cloud and “range” queries
  • DGGRID has great grid construction functions (including ideal equal-area hexagons - like ISEA7H → Snyder equal-area aperture 7 hexagons), but currently only supports grid creation and binning of data, but no further geo-referenced location-, neighborhood- or grid traversal
  • Dask DataFrame partitioned compute already easy, but could be more “spatially” aware
  • can DGGS indexing systems be used/integrated with xarray semantics?

I am still working my way through the forum. Related works/discussions:

3 Likes

Hi @allixender, thanks for raising this topic! Some concepts similar to DGGS are are leveraged by practitioners in the Pangeo community and the broader geosciences; probably the most straightforward example I can think of are the modeling communities building NWP/GCMs on icosahedral grids, such as the DWD ICON or the NICAM. In my (limited) experience, the algorithms that generate these grids to a particular discretization level have much in common with the S2/H3 approach, and I’m reasonably certain that some of the same indexing and cell topology functions that you’d use with S2/H3 would apply here.

I think an interesting gap to explore here that would be extremely useful for the community is the use case for these tools in the broader geosciences. I haven’t really seen much overlap in the workflows I’d use S2 for versus those I’d use more traditional geospatial gridding systems, but that doesn’t mean that they don’t exist.

From a software standpoint, general concepts on geospatial indexes and data structures would inevitably be useful. I’m increasingly use S2 in may day-to-day work (not by choice, though!) and it would be really interesting to see, say, a GeoPandas extension that let met do something like take a collection of lat/lon points and aggregate them to a specific S2 level. Can’t imagine that would be more difficult than some fancy groupby calculations, but native support would be great.

Also - visualization of DGGS grids. Major gap in the tooling I’ve seen so far. I’d be happy to share some Jupyter notebooks I use to render chloropleth-style plots with data that has been aggregated to S2 levels. Running with the above - would be really cool to natively re-aggregate the data under the hood in the fashion of datashader!

1 Like

@darothen You might be interested in this discussion Support for spherical geometry (S2) in Python · Issue #10 · geopandas/community · GitHub where we have the project of providing S2 vectorized Python bindings similarly to GEOS / PyGEOS (Shapely 2.0) and eventually use it as another backend geometry engine in GeoPandas for geographic data. This may happen later this year, hopefully (through GSoC and/or NumFOCUS grants).

Once available, those S2 bindings could be also useful in other Python libraries, e.g., for building Xarray compatible geographic indexes. This is already supported by xoak (GitHub - xarray-contrib/xoak: xarray extension that provides tree-based indexes used for selecting irregular, n-dimensional data.) and pys2index (GitHub - benbovy/pys2index: Python/NumPy compatible geographical index based on s2geometry), although with very limited functionality.

1 Like

can DGGS indexing systems be used/integrated with xarray semantics?

Once the Xarray explicit indexes refactor is complete, it should be pretty straightforward to do so.

See

Really great post @allixender! I personally think that we will all be using these sorts of grids in 10 years. The challenge is adapting our analysis software so that the user experience is as seamless as it is with rectangularly gridded data.

Let me bring in a modeling perspective. Many modern GCMs, such as MPAS, are using similar mesh based on icosahedral tessellation of the sphere. Here is a great visualization from https://github.com/dengwirda/jigsaw-geo-python:

Within the Pangeo-verse, @clyne, @erogluorhan, and other has been developing the uxarray project:

whose goal is to make xarray work better with such meshes. They are currently seeing feedback on their proposed API.


My question is this: how much of the functionality needed for DGGS is similar / the same as required for icosahedral GCM grids?

If there is a lot of overlap, maybe we can leverage many of the same software elements.

1 Like

Awesome, thanks for the heads-up on this! I’ll definitely keep an eye towards the outcome of this work.

Just a quick followup: UXarray is a component of the Project Raijin effort, whose goals are to produce and support Python tools for the analysis and visualization of unstructured grids, primarily those arising from global weather and climate models. If this work can benefit both the DGGS communities and Raijin’s target Earth System Modeling communities, well that would just be terrific :slight_smile:

2 Likes

Hi all, thanks for your contributions and infos. I’d like to share three use cases from DGGS / GIS world where I am coming from. Those are some recent example in the scientific literature of applying DGGS in larger scale:

@darothen Of course, H3 and S2 are already mature software libraries, which is good, but I’d say their main purpose is to efficiently index and aggregate (bin) point data for sub-sequent analysis. You can do in a few lines of code:

gdf = geopandas.read_file(pointdata)
gdf["h3_id"] = gdf["geometry"].apply(lambda g: h3.geo_to_h3(x.y, x.x, resolution))

And to aggregate you would calculate parent cell ids, groupby and then do you aggregation function … and same in S2. This is really powerful working only over the IDs which “know” their geometry/location and resolution.
And having those functionalities and the tooling closer and easier to access would be great (thanks for pointing to that thread @benbovy ). However, in the climate/environmental modeling space we want grid systems that represent continuous phenomena, for those are also some “unstructured” grid systems available and in use, thanks for the other inputs here already. The whole climate-gridding stuff is a whole new world for me. As I am not so well versed in those I’d like to point out that the DGGSs like H3, S2 OpenEAGGR have nice indexing systems that make neighborhood function really practical (cell neighbors in a way like an array[i+1] but for hexagons or triangles as well), but also hierarchical indexing, you get the parent or your defined children cells for an index really quick and easy, so you scale up and down within the grid system. And, again emphasis on DGGS sematics, a cell (thus cell ID) is always at the same location. Giving that cell ID to someone else will allow them to exactly identify location and resolution (knowing the DGGS though, H3, S2, rHEALPix, OpenEAGGR)! Unstructured grids or meshes can’t do those things. If those systems could be used with UXarray and the other big climate gridding stuff that be amazing.

But again I want to stress, I personally don’t want to use S2 or H3 in environmental modelling, because they don’t have equal area cells, across the globe, even within like continental extent their cells vary from factor 0.6 to 1.4 around the mean cell area. If I need to aggregate large scale landuse, forest cover, soils and some climatologies and calculate area-based statistics, we should address that as well. Here is where rHEALPix and ISEA-based grids should be more developed to be usable and integrated. S2 is nice to calculate distances and area on a sphere, not on an ellipsoid and certainly not on a geoid :wink: But I’d be happy to go step by step along with you guys.

My question is this: how much of the functionality needed for DGGS is similar / the same as required for icosahedral GCM grids?

Yes, @rabernat that is a good question I’d be happy to explore. How/where can I learn about icosahedral GCM grids?

@darothen visualization of DGGS grids … nice point: 3D ray shading I don’t know much, I have seen people using Blender, but GeoViews (Pyviz/Holoviz) is nicely integrated in Python (see figure below). Eventually. a reliable a 3D globe visualisation of the meshes would be nice across platforms, web, plots, QGIS :slight_smile:

Last but not least, having those grid systems work seamlessly from data loading of “classic cartesian” gridded raster and vector data into DGGS indexed grid and Xarray/Zarr format, converge tooling, share those “data” packages by cell id extents, no projection issues … that’d be the vision, and really helpful for smaller users downstream.

3 Likes

@allixender how were those cool global grid visualizations generated? I didn’t think GeoViews had native bindings to libraries that could produce the grid geometries on the fly…

I routinely need to visualize data that has already been aggregated on an S2 grid… a tool which could look at the uint64 cell ID that is already associated with each entry in my dataset and render the associated geometry over a Google Map or some Cartopy background would be sweet!

1 Like

The Project Raijin team has a standing monthly meeting on Thursdays, the next one being Thursday, April 14th, 10am MT. If anyone from the DGGS community would like to show up and talk about possible collaboration I’d be happy to put that on the agenda.

1 Like

Hey @darothen it is sort of manually retrieving the coordinates of the cell geometries, either centroid or boundary (the APIs provide that usually quite easily) based on the cell ID and then plain polygons. Not very efficiently currently, and in some softwares problematic with cells that cross the datumline in terms of visualisation.

1 Like

Hi @clyne thanks for that info, I would love to join, but currently I give lectures Tuesday mornings. But I’ll keep an I on the project.

1 Like

Ah, ok - I kinda anticipate as much. I have my own (non open-source, unfortunately - can look into changing that) simple toolkit for building S2-based chloropleth-like visualizations that also manually generates all of the cell polygons based on cell ID. A community tool that captures all of that logic would probably be better for everyone in the long-run rather than all of us having our personal code snippets. Since I used this code for my core job I need to look into protocol for releasing part of it as open source.

2 Likes

Hi I have read a bit over the Project Raijin and UXarray and UGrid. It would be an great “experiment” to see how to load DGGS geometries into such meshes. Working with Simple Features vector geometries is possible, but not ideal. Could the individual cells in those meshes be indexed and addressed with arbitrary inidices, i.e. the unique cell IDs based on the DGGS type?

1 Like

@tinaok also suggested to consider kerchunk and xgcm

Also, I came across The ICON Earth System Model ICON-ESM V1.0, the first coupled model based on the ICON (ICOsahedral Non-hydrostatic) framework with its un-structured, icosahedral grid concept. I was reading a little into the background. At first glance, while already converging on icosahedral equal-area and congruent hierarchical tesselation (here triangles, ISEA4T I’d say), the foundational modelling is done over the unstructured grid, i.e. navigation and addressing of cells during the modelling phase is based only on the geometries, i.e. edges, not cell IDs, which could be potentially tested with the semantics of a DGGS at some point. It would be interesting to know more about the in- and outs of the actual data handling, presumably plain GIS for the data preps in this model, after all those cells are still big and at their “highest” resolution their are something like 327,720 cells. In DGGRID ISEA4T resolution 7 has about 320,000 cells with a mean cell area of 1500 km2 (like a corresponding ca 39 km pixel resolution).
We’ll hope to see those activities converge with the high-level indexing and data integration of a DGGS in the future.

Dear @clyne I’d just like to catch up on the Project Raijin standing monthly meeting on Thursdays. So April 14th, 10am MT (which time zone is MT - is that Boulder/Mountain Time, now actually MDT?) - TimeZoneMeetingPlanner

I think I could make a short appearance if possible?

Hi @allixender, that would be great. Yes the timezone is MDT (April 14, 10am MDT). Here is the Google Meets link:

Project Raijin monthly meeting
Thursday, April 14 · 10:00 – 11:00am
Google Meet joining info
Video call link: https://meet.google.com/gbn-vwdo-scb
Or dial: ‪(US) +1 503-908-2441‬ PIN: ‪962 727 640‬#
More phone numbers: https://tel.meet/gbn-vwdo-scb?pin=7536637037644

1 Like

Hi all, FYI I received the sad news that our EU Horizons DGGS proposal was unfortunately not funded. But as promised we would like to continue innovation and research on DGGS and take the interest and positive vibe to develop better tools and make DGGS more easily usable with current spatial data challenges. In that sense, a short shameless self-promotion, and hopefully of interest for you as well:

Alexander Kmoch, Ivan Vasilyev, Holger Virro & Evelyn Uuemaa (2022) Area and shape distortions in open-source discrete global grid systems, Big Earth Data, doi: https://dx.doi.org/10.1080/20964471.2022.2094926

Short summary here :slight_smile: https://twitter.com/allixender/status/1554360770825261056

Best regards,
Alex

5 Likes

Sorry to hear about the missed funding opportunity!

A question for those of you on this thread: I am trying to devise a way to create geometry-based indexes for the geoboundaries.org dataset. Essentially, today, we generate IDs based on a hash of the vector geometries. We would like to move to generating an ID based on a uniform, global grid, and then capturing what grid cells are overlapped by a given geometry.

The challenge is scale - being in the socioeconomic world, we operate at very small scales relative to the climate modeling community I think most of you represent :). Are there any global grids that can scale down to arbitrarily small levels, and are their well implemented strategies to doing this based on need? I.e., I know that some mesh approaches have fine-grained meshes in areas that are climatologically relevant for a model, but have very little insight into what the “best” solutions are to date, or if there are limitations in the ways things scale. Papers on the topic would be very welcome if you have any recommendations.

1 Like