Interpolate onto different Arakawa-grid positions

Benoit · October 2, 2024, 7:35am

If I have say variable uo at cell corners on an Arakawa B-grid. What’s the recommended way to interpolate uo to the u-positions of the corresponding C-grid? I’m sure this has been answered elsewhere but I couldn’t find it and so I’m confused what’s the best way to do this and which tool to use between xgcm, xesmf, etc.

rabernat · October 2, 2024, 8:07pm

Xgcm was designed specifically for this purpose.

https://xgcm.readthedocs.io/en/latest/grids.html

Benoit · October 3, 2024, 12:33am

Thanks Ryan! I had read this part of the xgcm docs prior to posting but I find these lacking in terms of code examples for CMIP data specifically (maybe I should have specified that). I posted here hoping for a little more hand holding: I’m sure there is a notebook somewhere doing exactly that (interpolate B-grid uo to the corresponding C-grid). I’ll admit that maybe this is just my lack of Python experience showing.

I could do the interpolation by hand in Julia, but I think we agree that Python (with xarray, xgcm, xmip, and so on) is the best tool for that at the moment. FWIW, I want to do this interpolation to test a new Julia package that I am developing that aims to build ocean transport matrices from CMIP output (OceanTransportMatrixBuilder.jl; specifically issue #3).

rabernat · October 3, 2024, 1:59am

Can you post some sample data? Or at least the Xarray repr (print(ds)) of your dataset?

For xgcm to work, the dataset needs to have sufficient metadata to distinguish the different grid positions.

Shifting from B-grid to C-grid requires two interpolations (grid.interp).

While it doesn’t explicitly address your exact scenario, the MITgcm example is probably the cleanest and easiest to follow.

Benoit · October 3, 2024, 3:22am

I just made this tiny Jupyter notebook to load the uo data I’m looking to interpolate. I assume the interface through nbviewer is better for anyone wanting to inspect the data but here is a copy pasta (truncated by GitHub, sorry, let me know if I should print it in full) just in case:

<xarray.Dataset>
Dimensions:        (time: 1980, lev: 50, y: 300, x: 360, bnds: 2, vertex: 4)
Coordinates:
  * time           (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T1...
    time_bounds    (time, bnds) datetime64[ns] dask.array<chunksize=(120, 2), meta=np.ndarray>
  * lev            (lev) float64 5.0 15.0 25.0 ... 5.166e+03 5.499e+03 5.831e+03
    lev_bounds     (lev, bnds) float64 dask.array<chunksize=(50, 2), meta=np.ndarray>
    lat            (y, x) float64 dask.array<chunksize=(60, 60), meta=np.ndarray>
    lon            (y, x) float64 dask.array<chunksize=(60, 60), meta=np.ndarray>
    lat_verticies  (y, x, vertex) float64 dask.array<chunksize=(60, 60, 4), meta=np.ndarray>
    lon_verticies  (y, x, vertex) float64 dask.array<chunksize=(60, 60, 4), meta=np.ndarray>
  * y              (y) int64 0 1 2 3 4 5 6 7 ... 292 293 294 295 296 297 298 299
  * x              (x) int64 0 1 2 3 4 5 6 7 ... 352 353 354 355 356 357 358 359
    lon_bounds     (bnds, y, x) float64 dask.array<chunksize=(1, 60, 60), meta=np.ndarray>
    lat_bounds     (bnds, y, x) float64 dask.array<chunksize=(1, 60, 60), meta=np.ndarray>
Dimensions without coordinates: bnds, vertex
Data variables:
    uo             (time, lev, y, x) float32 dask.array<chunksize=(120, 50, 60, 60), meta=np.ndarray>
Attributes: (12/58)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            21915.0
    data_specs_version:               01.00.30
    ...                               ...
    intake_esm_attrs:member_id:       r1i1p1f1
    intake_esm_attrs:variable_id:     uo
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:version:         v20191115
    intake_esm_attrs:_data_format_:   netcdf
    intake_esm_dataset_key:           l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r...

Shifting from B-grid to C-grid requires two interpolations (grid.interp).

Is that because xgcm interpolates one axis at a time (so in the case of uo, once horizontally and once vertically)?

I’m not sure if the required metadata is included in these CMIP outputs (I personally don’t see it, but I’m not sure what to look for). My understanding, however, is that xmip + xgcm can generate this metadata automatically, is that right?

rabernat · October 3, 2024, 2:34pm

So this dataset doesn’t [yet] have the coordinates in the form needed by xgcm. Instead, they use the CF-conventions and specify the lon and lat points explicitly as arrays. The problem with this is that the metadata doesn’t make it clear:

That this dataset uses and arakawa grid
Which variables are located at which grid points

For my explanation of why I don’t like this convention, read this thread.

github.com/cf-convention/discuss

"mesh variable" instead of "boundary variable" for contiguous grid cells

opened 05:16AM - 23 Nov 19 UTC

rabernat

I work every day with ocean models that use orthogonal curvilinear coordinates (…MITgcm, MOM, POP, ROMS, NEMO, etc. etc.). This is an example tripolar grid from CESM: ![image](https://user-images.githubusercontent.com/1197350/209829079-a89cd43c-faaa-44ee-a9e7-5b3f3db6eb86.png) The grid cells in such models are contiguous quads, with four points specifying the lat / lon vertex locations of each cell. CF conventions tell me ([Section 7.1: Cell Boundaries](http://cfconventions.org/cf-conventions/cf-conventions.html#cell-boundaries)) that I should use a *boundary_variable*. > A boundary variable will have one more dimension than its associated coordinate or auxiliary coordinate variable. > In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude `lat(n,m)` and longitude `lon(n,m)`, and the associated cells are four-sided, then the boundary variables are given in the form `latbnd(n,m,4)` and `lonbnd(n,m,4)`, where the trailing index runs over the four vertices of the cells This convention is general enough to accommodate potentially overlapping or non-contiguous quads, essentially `n x m` totally unrelated four-sided shapes. My main point: **It's inefficient to store structured grid geometry this way.** > The bounds can be used to decide whether cells are contiguous via the following relationships... I don't want to have to check this, I want the conventions to tell me. In our latest global high-resolution ocean models, I have a mesh that is of size `n=12960, m=17280`, 223 million cells. I am interested in streamlining my analysis and visualization workflow as much as possible, which means minimizing the required memory and computational steps. Instead of specifying a boundary variable, I propose to introduce the concept of a **mesh variable**, with the following conventions: - A mesh variable will have _the same number of dimensions_ as its associated coordinate or auxiliary coordinate variable, but with _one extra element in each dimension_. - In the case where the horizontal grid is described by two-dimensional auxiliary coordinate variables in latitude `lat(n,m)` and longitude `lon(n,m)`, and the associated cells are four-sided _and contiguous_, then the mesh variables are given in the form `latmesh(n+1, m+1)` and `lonmesh(n+1, m+1)`. It would not be hard to generate such data, since this is how most GCMs keep track of their own coordinate grids internally (e.g. [MITgcm](https://mitgcm.readthedocs.io/en/latest/algorithm/horiz-grid.html)). This convention also aligns well with how most visualization software plots such data, e.g. [matplotlib's pcolormesh function](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.pcolormesh.html). So adding something like this to the CF conventions would streamline the path from model output to plotting, eliminating the potentially error-prone step of encoding, and then decoding, the "boundary variable" type coordinates. For the dataset I described above, the difference is about 3 GB of memory. I don't feel strongly about what it's called. Maybe "mesh variable" is not the right choice. But I feel something like this is sorely needed. cc @adcroft & @StephenGriffies, with whom this topic has come up repeatedly.

The use of lon_verticies(y, x, vertex) or lon_bounds(bnds, y, x) is redundant and overly general for Arakawa grids. Instead we need separate dimension coordinates x_g, y_g to distinguish between cell center and cell edge

To explore this, I loaded a comparable dataset from the CMIP6 cloud data on S3

import xarray as xr
from s3fs import S3FileSystem
import xgcm

fs = S3FileSystem()
ds = xr.open_dataset(
    "s3://s3://cmip6-pds/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/Omon/uo/gn/v20191115",
    decode_coords="all",
    engine="zarr",
    chunks={}
)
print(ds)
print(ds.info())  # below the fold

<xarray.Dataset> Size: 43GB
Dimensions:             (i: 360, j: 300, lev: 50, bnds: 2, time: 1980,
                         vertices: 4)
Coordinates:
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 294 295 296 297 298 299
    latitude            (j, i) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
    longitude           (j, i) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
Dimensions without coordinates: bnds, vertices
Data variables:
    uo                  (time, lev, j, i) float32 43GB dask.array<chunksize=(5, 50, 300, 360), meta=np.ndarray>

xarray.Dataset {
dimensions:
	i = 360 ;
	j = 300 ;
	lev = 50 ;
	bnds = 2 ;
	time = 1980 ;
	vertices = 4 ;

variables:
	int32 i(i) ;
		i:long_name = cell index along first dimension ;
		i:units = 1 ;
	int32 j(j) ;
		j:long_name = cell index along second dimension ;
		j:units = 1 ;
	float64 latitude(j, i) ;
		latitude:bounds = vertices_latitude ;
		latitude:long_name = latitude ;
		latitude:standard_name = latitude ;
		latitude:units = degrees_north ;
	float64 lev(lev) ;
		lev:axis = Z ;
		lev:bounds = lev_bnds ;
		lev:long_name = ocean depth coordinate ;
		lev:positive = down ;
		lev:standard_name = depth ;
		lev:units = m ;
	float64 lev_bnds(lev, bnds) ;
	float64 longitude(j, i) ;
		longitude:bounds = vertices_longitude ;
		longitude:long_name = longitude ;
		longitude:standard_name = longitude ;
		longitude:units = degrees_east ;
	datetime64[ns] time(time) ;
		time:axis = T ;
		time:bounds = time_bnds ;
		time:long_name = time ;
		time:standard_name = time ;
	datetime64[ns] time_bnds(time, bnds) ;
	float32 uo(time, lev, j, i) ;
		uo:cell_methods = time: mean ;
		uo:comment = Prognostic x-ward velocity component resolved by the model. ;
		uo:long_name = Sea Water X Velocity ;
		uo:standard_name = sea_water_x_velocity ;
		uo:units = m s-1 ;
	float64 vertices_latitude(j, i, vertices) ;
		vertices_latitude:units = degrees_north ;
	float64 vertices_longitude(j, i, vertices) ;
		vertices_longitude:units = degrees_east ;

// global attributes:
	:Conventions = CF-1.7 CMIP-6.2 ;
	:activity_id = CMIP ;
	:branch_method = standard ;
	:branch_time_in_child = 0.0 ;
	:branch_time_in_parent = 21915.0 ;
	:cmor_version = 3.4.0 ;
	:creation_date = 2019-11-15T15:41:24Z ;
	:data_specs_version = 01.00.30 ;
	:experiment = all-forcing simulation of the recent past ;
	:experiment_id = historical ;
	:forcing_index = 1 ;
	:frequency = mon ;
	:further_info_url = https://furtherinfo.es-doc.org/CMIP6.CSIRO.ACCESS-ESM1-5.historical.none.r1i1p1f1 ;
	:grid = native atmosphere N96 grid (145x192 latxlon) ;
	:grid_label = gn ;
	:history = 2019-11-15T15:41:24Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards. ;
	:initialization_index = 1 ;
	:institution = Commonwealth Scientific and Industrial Research Organisation, Aspendale, Victoria 3195, Australia ;
	:institution_id = CSIRO ;
	:license = CMIP6 model data produced by CSIRO is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/).  Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment.  Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).  The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law. ;
	:mip_era = CMIP6 ;
	:nominal_resolution = 250 km ;
	:notes = Exp: ESM-historical; Local ID: HI-05; Variable: uo (['u']) ;
	:parent_activity_id = CMIP ;
	:parent_experiment_id = piControl ;
	:parent_mip_era = CMIP6 ;
	:parent_source_id = ACCESS-ESM1-5 ;
	:parent_time_units = days since 0101-1-1 ;
	:parent_variant_label = r1i1p1f1 ;
	:physics_index = 1 ;
	:product = model-output ;
	:realization_index = 1 ;
	:realm = ocean ;
	:run_variant = forcing: GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a) ;
	:source = ACCESS-ESM1.5 (2019): 
aerosol: CLASSIC (v1.0)
atmos: HadGAM2 (r1.1, N96; 192 x 145 longitude/latitude; 38 levels; top level 39255 m)
atmosChem: none
land: CABLE2.4
landIce: none
ocean: ACCESS-OM2 (MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)
ocnBgchem: WOMBAT (same grid as ocean)
seaIce: CICE4.1 (same grid as ocean) ;
	:source_id = ACCESS-ESM1-5 ;
	:source_type = AOGCM ;
	:status = 2020-02-04;created; by gcs.cmip6.ldeo@gmail.com ;
	:sub_experiment = none ;
	:sub_experiment_id = none ;
	:table_id = Omon ;
	:table_info = Creation Date:(30 April 2019) MD5:e14f55f257cceafb2523e41244962371 ;
	:title = ACCESS-ESM1-5 output prepared for CMIP6 ;
	:tracking_id = hdl:21.14100/c2d89f4c-94e1-48f9-b1e6-050516684527
hdl:21.14100/c95088be-d897-4cbf-99a7-89c62a5349f2
hdl:21.14100/e884b2d6-fefb-457b-b6d5-b16470438ae1
hdl:21.14100/39f1dfc6-d610-4ccd-81a0-081754685c3c
hdl:21.14100/a473cac5-8853-4fb5-a3b1-7eb903da4f7f
hdl:21.14100/949cd5ca-0176-46e8-80e8-806811ed3ffa
hdl:21.14100/8a4394ec-15d3-402c-aabd-3b44bb5746e5
hdl:21.14100/021a5cd4-15f9-490c-8a0e-98237a22a5e3
hdl:21.14100/44f49869-2053-4d30-a58b-3ef777d40363
hdl:21.14100/687cffee-85d2-444b-bf43-9700f2b9bf49
hdl:21.14100/5132fa3f-5642-4049-8e4c-26a2cde3b97a
hdl:21.14100/4dad6097-0779-4031-b014-d9233207f6dc
hdl:21.14100/d13e6d17-b98e-48d3-aaa4-3162770d00ef
hdl:21.14100/c5850670-8cea-46d1-bcb2-83faecdd0161
hdl:21.14100/8df637a7-8ebc-435a-ba52-c217a1e2365c
hdl:21.14100/5506cf97-e19b-4be1-aca1-3748d792275b
hdl:21.14100/225bc0b3-2081-492d-ac6e-da9c4014c5b7 ;
	:variable_id = uo ;
	:variant_label = r1i1p1f1 ;
	:version = v20191115 ;
	:netcdf_tracking_ids = hdl:21.14100/c2d89f4c-94e1-48f9-b1e6-050516684527
hdl:21.14100/c95088be-d897-4cbf-99a7-89c62a5349f2
hdl:21.14100/e884b2d6-fefb-457b-b6d5-b16470438ae1
hdl:21.14100/39f1dfc6-d610-4ccd-81a0-081754685c3c
hdl:21.14100/a473cac5-8853-4fb5-a3b1-7eb903da4f7f
hdl:21.14100/949cd5ca-0176-46e8-80e8-806811ed3ffa
hdl:21.14100/8a4394ec-15d3-402c-aabd-3b44bb5746e5
hdl:21.14100/021a5cd4-15f9-490c-8a0e-98237a22a5e3
hdl:21.14100/44f49869-2053-4d30-a58b-3ef777d40363
hdl:21.14100/687cffee-85d2-444b-bf43-9700f2b9bf49
hdl:21.14100/5132fa3f-5642-4049-8e4c-26a2cde3b97a
hdl:21.14100/4dad6097-0779-4031-b014-d9233207f6dc
hdl:21.14100/d13e6d17-b98e-48d3-aaa4-3162770d00ef
hdl:21.14100/c5850670-8cea-46d1-bcb2-83faecdd0161
hdl:21.14100/8df637a7-8ebc-435a-ba52-c217a1e2365c
hdl:21.14100/5506cf97-e19b-4be1-aca1-3748d792275b
hdl:21.14100/225bc0b3-2081-492d-ac6e-da9c4014c5b7 ;
	:version_id = v20191115 ;
}None

Note that this dataset uses i, j as dimensions rather than x, y, but otherwise has the same structure.

To use xgcm, we need to have actual dimension coordinates corresponding to the other grid positions. Just saying this is a B-grid doesn’t give us enough information. Are the B-grid cells top right or bottom left with respect to the cell centers? I’m going to assume they are bottom left.

If we know this, we can add some dimensions and set up our xgcm Grid.

ds = (
    ds
    .rename({'i': 'i_corner', 'j': 'j_corner'})
    .assign_coords(
        {
            # Note that coordinate values here don't actually matter
            # as they are in logical grid space rather than physical space.
            # But we need some way to keep track of the fact that these
            # dimensions are _different_ from the corner dimensions
            'i_center': ds.i.values + 0.5,
            'j_center': ds.j.values + 0.5
        }
    )
)
print(ds)

<xarray.Dataset> Size: 43GB
Dimensions:             (i_corner: 360, j_corner: 300, lev: 50, bnds: 2,
                         time: 1980, vertices: 4, i_center: 360, j_center: 300)
Coordinates:
  * i_corner            (i_corner) int32 1kB 0 1 2 3 4 5 ... 355 356 357 358 359
  * j_corner            (j_corner) int32 1kB 0 1 2 3 4 5 ... 295 296 297 298 299
    latitude            (j_corner, i_corner) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
    longitude           (j_corner, i_corner) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    vertices_latitude   (j_corner, i_corner, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
    vertices_longitude  (j_corner, i_corner, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
  * i_center            (i_center) float64 3kB 0.5 1.5 2.5 ... 357.5 358.5 359.5
  * j_center            (j_center) float64 2kB 0.5 1.5 2.5 ... 297.5 298.5 299.5
Dimensions without coordinates: bnds, vertices
Data variables:
    uo                  (time, lev, j_corner, i_corner) float32 43GB dask.array<chunksize=(5, 50, 300, 360), meta=np.ndarray>

Now we see that we have two sets of dimension coordinates (corner and center). From different combinations of these, we can reconstruct all of the different possible positions on an Arakawa grid. Many ocean models (e.g. MITgcm, MOM6, ROMS) include these types of dimension coordinates by default in their output. But CF conventions don’t say anything about this.

With this, we can create the xgcm.Grid

grid = xgcm.Grid(
    ds,
    coords={
        'X': {'center': 'i_center', 'left': 'i_corner'},
        'Y': {'center': 'j_center', 'left': 'j_corner'}
    },
    periodic=['X']
)

<xgcm.Grid>
X Axis (periodic, boundary=None):
  * center   i_center --> left
  * left     i_corner --> center
Y Axis (not periodic, boundary=None):
  * center   j_center --> left
  * left     j_corner --> center

Now we can use this to easily do the desired interpolation.

uo_center = grid.interp(ds.uo, axis=['X', 'Y'])

<xarray.DataArray 'uo' (time: 1980, lev: 50, j_center: 300, i_center: 360)> Size: 43GB
dask.array<transpose, shape=(1980, 50, 300, 360), dtype=float32, chunksize=(5, 50, 300, 360), chunktype=numpy.ndarray>
Coordinates:
  * lev       (lev) float64 400B 5.0 15.0 25.0 ... 5.166e+03 5.499e+03 5.831e+03
  * time      (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 2014-12-16T1...
  * i_center  (i_center) float64 3kB 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
  * j_center  (j_center) float64 2kB 0.5 1.5 2.5 3.5 ... 296.5 297.5 298.5 299.5

This is typical of xgcm usage. We spend a lot of time setting up the proper metadata and coordinate system. Then the operation itself is trivial.

If you want to get the coordinates of those points, you could try interpolating latitude and longitude the same way. But will they line up with the vertices variables? Hard to say…

Benoit · October 4, 2024, 1:09am

Wow! Thank you so much for the detailed response and hand-holding!

So you are manually constructing the cell centers in some sense above. Is it possible to automate this process further by using another variable that is already centered? E.g., if I also loaded areacello (or thetao or another variable that I know lives at the grid cell centers), then xgcm should be able to match uo’ s (lon, lat) with one of areacello’s centers, face centers, edge centers, or edge corners, thus determining both the grid type (A, B, C, etc.) and the orientation (B-grid top right, B-grid bottom left, etc.) right?

So ideally, I’m looking for some interface that would look like

xgcm.interp(uo, areacello, grid_type='C', position='u')

where we interpolate uo onto the grid of areacello at the C-grid u position, if that makes sense.

PS1: BTW I hope vertex order will be resolved soon by xmip; see e.g., related xmip issue #368 and PR #370.

PS2: I’m not entirely sure but the difference that you noted about the index names ((i,j) vs (x,y)) may just have been caused by preprocessing with xmip.

rabernat · October 4, 2024, 12:36pm

This sounds like a very difficult and error-prone procedure. You’re asking to examine different variables from different datasets and, using only the coordinate information, somehow reconstruct the fact that they are both part of the same staggered Arakawa grid system. This will be quite sensitive to numerical tolerance issues. For example, you’ll have to answer what does it mean for a coordinate position to be “in the middle” between two other coordinates? We would need to distinguish between actual Arakawa grids and actually different GCMs that have similar but not identical grids.

It would be so much easier if the dataset would just declare this information explicitly, e.g. using the SGRID conventions (which xgcm understands, and if set, can automatically build a grid for.). After all, the model developers know that they are using an Arakawa grid! It’s fundamental to all of the algorithms. This metadata belongs in the dataset itself, but CF doesn’t give a way to do that.

So you are stuck relying on “extra” information, such as your prior knowledge about the grid.

This is an easier ask. It would be possible to create an API like this for xgcm. It would involve automatically generating a Grid object from uo (only possible with sufficient metadata, as described above), defining what you mean by grid_type='C' and position='u', and them wrapping the Grid.interp operation.

We would welcome a PR to xgcm to add that sort of functionality.

jbusecke · October 10, 2024, 8:18pm

Hey folks,

Couple of thoughts from my end:

Is it possible to automate this process further by using another variable that is already centered? E.g., if I also loaded areacello (or thetao or another variable that I know lives at the grid cell centers), then xgcm should be able to match uo’ s (lon, lat) with one of areacello’s centers, face centers, edge centers, or edge corners, thus determining both the grid type (A, B, C, etc.) and the orientation (B-grid top right, B-grid bottom left, etc.) right?

That is essentially what I tried to do in xMIP/xmip/grids.py at main · jbusecke/xMIP · GitHub, but this could definitely use more work.

I agree 10000% with Ryans comments that this information should be included as metadata, it would make the logic above a lot more robust and accurate.

I started collecting the ‘real’ grid data for various models by emailing folks I know Collecting grid-metric files for CMIP6 output for cloud analysis - #11 by emaroon but I never got around to putting this data up. Having the original grid information would solve this problem if we can assume that all the outputs are actually unaltered. But I am not sure that is the case. I have encountered examples where I had the original grid file, e.g. indicating that a B grid with velocities in the lower left corner was used, but diagnosing from the provided lon/lat values told me that the position is in the upper/right corner or on the center. It is hard to tell whether this is an error with the provided coordinates, or if the data has actually interpolated - I think in some cases this is done, but again I know of no way to tell this from the data and I have not found any reliable documentation of this anywhere.

After all, the model developers know that they are using an Arakawa grid! It’s fundamental to all of the algorithms. This metadata belongs in the dataset itself, but CF doesn’t give a way to do that.

If I ever find a minute I would love to retrofit this in some way shape or form, but I think an important thing to do here from the community is to flag this as an issue to be addressed for CMIP7! I encourage you to contact the Data Access Task Team and describe this as a blocking issue.

This is an easier ask. It would be possible to create an API like this for xgcm. It would involve automatically generating a Grid object from uo (only possible with sufficient metadata, as described above), defining what you mean by grid_type='C' and position='u', and them wrapping the Grid.interp operation.

We would welcome a PR to xgcm to add that sort of functionality.

Unless I misunderstand this ask, you can already use grid.interp_like for this?

Topic		Replies	Views
XGCM: Your friend and helper for naughty grids CMIP6 Hackathon	10	2504	February 25, 2020
Interpolating 2D data with periodic boundaries to points using xarray Science	4	1039	September 9, 2022
Interpolating Curvilinear Data Science	3	366	June 4, 2024
Interpolation and regridding Science	15	8523	November 3, 2021
Collecting grid-metric files for CMIP6 output for cloud analysis Cloud	10	1447	June 27, 2022

Interpolate onto different Arakawa-grid positions

Related topics