Lat/Lon grid mismatch in variables from the same CMIP6 model

chaithra · August 26, 2024, 7:33am

Hi,

I’m working on some calculations involving precipitation and omega using CMIP6 models. However, I’ve encountered an issue with some models, such as ‘ACCESS-ESM1-5’, where the latitude and longitude grids for precipitation and omega don’t match.

I need both precipitation and omega to be on the same grid. Is it okay to reindex one of the variable grids using rearest method?
Has anyone faced a similar issue, or can anyone suggest a solution?

Thanks in advance!

huard · August 26, 2024, 1:15pm

I wouldn’t use nearest neighbor for this. Within models, variables are regridded from corners, or edges to center and vice-versa all the time, and from what I’ve seen in another model, this is done using finite differences. I think I’d try first with bilinear interpolation to interpolate omega onto the precip grid, because I expect omega to be smoother and less susceptible to interpolation errors than precip. Others will probably have more informed opinions on this.

pochedley · August 26, 2024, 5:11pm

Hi @chaithra – I agree with @huard’s rationale to remap omega to the precipitation grid. You could use xESMF or xcdat (xcdat uses either regrid2 or xesmf under the hood) for remapping (these both work with xarray). They have bilinear interpolation (and can handle circular coordinate systems) or you could also consider conservative regridding if you wanted to regrid precipitation and make sure that you conserved total precipitation globally. Others might have additional regridder suggestions.

jbusecke · August 27, 2024, 1:33pm

Hi @chaithra,

I think @huard is on the right track here. These variables seem to be on different grid positions (cell face vs center). Remapping would probably work, but is expensive. You could also try to use xgcm to interpolate between grid positions. But to verify the assumption I would suggest to check that the lat/lon values of one variable are captured by the lat_bounds/lon_bounds of the other. If true I would assume that no additional interpolation was carried out and the variables are simply output on different grid positions. A simple linear interpolation on the logically rectangular grid could thus be a good way to co-locate the variables.

On a more general note this sort of thing is (to my knowledge) totally undocumented for CMIP data, which is pretty bad, since it leaves the detective work to the user!

chaithra · August 28, 2024, 7:29am

Thanks!!. @huard @jbusecke @pochedley
This helps a lot.

rabernat · August 28, 2024, 2:50pm

The CF Conventions have recently be expanded to include UGRID Conventions for describing unstructured grid meshes in detail: NetCDF Climate and Forecast (CF) Metadata Conventions

Unfortunately they skipped over the more common scenario of staggered quad grids! AFAIK there is still no solution in CF conventions for how to encode this sort of information into the files.

dcherian · August 28, 2024, 8:28pm

Not true! SGRID Conventions (v0.3) | Staggered Grid data model (SGRID) exists though I don’t know that it’s been elevated to the CF level. ROMS has been writing these attributes for quite a while. And cf-xarray can parse them: SGRID / UGRID - cf_xarray documentation

rabernat · August 29, 2024, 12:53am

I absolutely know that SGRID conventions exist

this is what I am saying.

I tried to convince the CF people to incorporate SGRID as well as UGRID, but they didn’t go for it

github.com/cf-convention/discuss

"mesh variable" instead of "boundary variable" for contiguous grid cells

opened 05:16AM - 23 Nov 19 UTC

rabernat

I work every day with ocean models that use orthogonal curvilinear coordinates (…MITgcm, MOM, POP, ROMS, NEMO, etc. etc.). This is an example tripolar grid from CESM: ![image](https://user-images.githubusercontent.com/1197350/209829079-a89cd43c-faaa-44ee-a9e7-5b3f3db6eb86.png) The grid cells in such models are contiguous quads, with four points specifying the lat / lon vertex locations of each cell. CF conventions tell me ([Section 7.1: Cell Boundaries](http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)) that I should use a *boundary_variable*. > A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. > In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude `lat(n,m)` and longitude `lon(n,m)`, and the associated cells are four-sided, then the boundary variables are given in the form `latbnd(n,m,4)` and `lonbnd(n,m,4)`, where the trailing index runs over the four vertices of the cells This convention is general enough to accommodate potentially overlapping or non-contiguous quads, essentially `n x m` totally unrelated four-sided shapes. My main point: **It's inefficient to store structured grid geometry this way.** > The bounds can be used to decide whether cells are contiguous via the following relationships... I don't want to have to check this, I want the conventions to tell me. In our latest global high-resolution ocean models, I have a mesh that is of size `n=12960, m=17280`, 223 million cells. I am interested in streamlining my analysis and visualization workflow as much as possible, which means minimizing the required memory and computational steps. Instead of specifying a boundary variable, I propose to introduce the concept of a **mesh variable**, with the following conventions: - A mesh variable will have _the same number of dimensions_ as its associated coordinate or auxiliary coordinate variable, but with _one extra element in each dimension_. - In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude `lat(n,m)` and longitude `lon(n,m)`, and the associated cells are four-sided _and contiguous_, then the mesh variables are given in the form `latmesh(n+1, m+1)` and `lonmesh(n+1, m+1)`. It would not be hard to generate such data, since this is how most GCMs keep track of their own coordinate grids internally (e.g. [MITgcm](https://mitgcm.readthedocs.io/en/latest/algorithm/horiz-grid.html)). This convention also aligns well with how most visualization software plots such data, e.g. [matplotlib's pcolormesh function](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.pcolormesh.html). So adding something like this to the CF conventions would streamline the path from model output to plotting, eliminating the potentially error-prone step of encoding, and then decoding, the "boundary variable" type coordinates. For the dataset I described above, the difference is about 3 GB of memory. I don't feel strongly about what it's called. Maybe "mesh variable" is not the right choice. But I feel something like this is sorely needed. cc @adcroft & @StephenGriffies, with whom this topic has come up repeatedly.

aidan · August 31, 2024, 12:30pm

For ACCESS models you’re more than welcome to hop on over to the ACCESS Hive Forum and see if someone else has already discussed that issue:

Or sign up and ask for more information, though I think the suggestion about staggered grids is likely correct, but I personally don’t know much about the atmospheric fields.

Topic		Replies	Views
Regridding of unstructured data (climate model) CMIP6 Hackathon	1	1060	March 10, 2022
Interpolation and regridding Science	15	8490	November 3, 2021
Converting between curvilinear and lon/lat Science	4	3178	March 24, 2021
Interpolating 2D data with periodic boundaries to points using xarray Science	4	1032	September 9, 2022
Upsample and regrid curvilinear grid of geostationary data Science	14	2260	February 16, 2022

Lat/Lon grid mismatch in variables from the same CMIP6 model

Related topics