Interpolate onto different Arakawa-grid positions

If I have say variable uo at cell corners on an Arakawa B-grid. What’s the recommended way to interpolate uo to the u-positions of the corresponding C-grid? I’m sure this has been answered elsewhere but I couldn’t find it and so I’m confused what’s the best way to do this and which tool to use between xgcm, xesmf, etc.

1 Like

Xgcm was designed specifically for this purpose.

https://xgcm.readthedocs.io/en/latest/grids.html

1 Like

Thanks Ryan! I had read this part of the xgcm docs prior to posting but I find these lacking in terms of code examples for CMIP data specifically (maybe I should have specified that). I posted here hoping for a little more hand holding: I’m sure there is a notebook somewhere doing exactly that (interpolate B-grid uo to the corresponding C-grid). I’ll admit that maybe this is just my lack of Python experience showing.

I could do the interpolation by hand in Julia, but I think we agree that Python (with xarray, xgcm, xmip, and so on) is the best tool for that at the moment. FWIW, I want to do this interpolation to test a new Julia package that I am developing that aims to build ocean transport matrices from CMIP output (OceanTransportMatrixBuilder.jl; specifically issue #3).

Can you post some sample data? Or at least the Xarray repr (print(ds)) of your dataset?

For xgcm to work, the dataset needs to have sufficient metadata to distinguish the different grid positions.

Shifting from B-grid to C-grid requires two interpolations (grid.interp).

While it doesn’t explicitly address your exact scenario, the MITgcm example is probably the cleanest and easiest to follow.

1 Like

I just made this tiny Jupyter notebook to load the uo data I’m looking to interpolate. I assume the interface through nbviewer is better for anyone wanting to inspect the data but here is a copy pasta (truncated by GitHub, sorry, let me know if I should print it in full) just in case:

<xarray.Dataset>
Dimensions:        (time: 1980, lev: 50, y: 300, x: 360, bnds: 2, vertex: 4)
Coordinates:
  * time           (time) datetime64[ns] 1850-01-16T12:00:00 ... 2014-12-16T1...
    time_bounds    (time, bnds) datetime64[ns] dask.array<chunksize=(120, 2), meta=np.ndarray>
  * lev            (lev) float64 5.0 15.0 25.0 ... 5.166e+03 5.499e+03 5.831e+03
    lev_bounds     (lev, bnds) float64 dask.array<chunksize=(50, 2), meta=np.ndarray>
    lat            (y, x) float64 dask.array<chunksize=(60, 60), meta=np.ndarray>
    lon            (y, x) float64 dask.array<chunksize=(60, 60), meta=np.ndarray>
    lat_verticies  (y, x, vertex) float64 dask.array<chunksize=(60, 60, 4), meta=np.ndarray>
    lon_verticies  (y, x, vertex) float64 dask.array<chunksize=(60, 60, 4), meta=np.ndarray>
  * y              (y) int64 0 1 2 3 4 5 6 7 ... 292 293 294 295 296 297 298 299
  * x              (x) int64 0 1 2 3 4 5 6 7 ... 352 353 354 355 356 357 358 359
    lon_bounds     (bnds, y, x) float64 dask.array<chunksize=(1, 60, 60), meta=np.ndarray>
    lat_bounds     (bnds, y, x) float64 dask.array<chunksize=(1, 60, 60), meta=np.ndarray>
Dimensions without coordinates: bnds, vertex
Data variables:
    uo             (time, lev, y, x) float32 dask.array<chunksize=(120, 50, 60, 60), meta=np.ndarray>
Attributes: (12/58)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            21915.0
    data_specs_version:               01.00.30
    ...                               ...
    intake_esm_attrs:member_id:       r1i1p1f1
    intake_esm_attrs:variable_id:     uo
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:version:         v20191115
    intake_esm_attrs:_data_format_:   netcdf
    intake_esm_dataset_key:           l.CMIP.CSIRO.ACCESS-ESM1-5.historical.r...

Shifting from B-grid to C-grid requires two interpolations (grid.interp).

Is that because xgcm interpolates one axis at a time (so in the case of uo, once horizontally and once vertically)?


I’m not sure if the required metadata is included in these CMIP outputs (I personally don’t see it, but I’m not sure what to look for). My understanding, however, is that xmip + xgcm can generate this metadata automatically, is that right?

So this dataset doesn’t [yet] have the coordinates in the form needed by xgcm. Instead, they use the CF-conventions and specify the lon and lat points explicitly as arrays. The problem with this is that the metadata doesn’t make it clear:

  • That this dataset uses and arakawa grid
  • Which variables are located at which grid points

For my explanation of why I don’t like this convention, read this thread.

The use of lon_verticies(y, x, vertex) or lon_bounds(bnds, y, x) is redundant and overly general for Arakawa grids. Instead we need separate dimension coordinates x_g, y_g to distinguish between cell center and cell edge

To explore this, I loaded a comparable dataset from the CMIP6 cloud data on S3

import xarray as xr
from s3fs import S3FileSystem
import xgcm

fs = S3FileSystem()
ds = xr.open_dataset(
    "s3://s3://cmip6-pds/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/Omon/uo/gn/v20191115",
    decode_coords="all",
    engine="zarr",
    chunks={}
)
print(ds)
print(ds.info())  # below the fold
<xarray.Dataset> Size: 43GB
Dimensions:             (i: 360, j: 300, lev: 50, bnds: 2, time: 1980,
                         vertices: 4)
Coordinates:
  * i                   (i) int32 1kB 0 1 2 3 4 5 6 ... 354 355 356 357 358 359
  * j                   (j) int32 1kB 0 1 2 3 4 5 6 ... 294 295 296 297 298 299
    latitude            (j, i) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
    longitude           (j, i) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    vertices_latitude   (j, i, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
    vertices_longitude  (j, i, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
Dimensions without coordinates: bnds, vertices
Data variables:
    uo                  (time, lev, j, i) float32 43GB dask.array<chunksize=(5, 50, 300, 360), meta=np.ndarray>
xarray.Dataset {
dimensions:
	i = 360 ;
	j = 300 ;
	lev = 50 ;
	bnds = 2 ;
	time = 1980 ;
	vertices = 4 ;

variables:
	int32 i(i) ;
		i:long_name = cell index along first dimension ;
		i:units = 1 ;
	int32 j(j) ;
		j:long_name = cell index along second dimension ;
		j:units = 1 ;
	float64 latitude(j, i) ;
		latitude:bounds = vertices_latitude ;
		latitude:long_name = latitude ;
		latitude:standard_name = latitude ;
		latitude:units = degrees_north ;
	float64 lev(lev) ;
		lev:axis = Z ;
		lev:bounds = lev_bnds ;
		lev:long_name = ocean depth coordinate ;
		lev:positive = down ;
		lev:standard_name = depth ;
		lev:units = m ;
	float64 lev_bnds(lev, bnds) ;
	float64 longitude(j, i) ;
		longitude:bounds = vertices_longitude ;
		longitude:long_name = longitude ;
		longitude:standard_name = longitude ;
		longitude:units = degrees_east ;
	datetime64[ns] time(time) ;
		time:axis = T ;
		time:bounds = time_bnds ;
		time:long_name = time ;
		time:standard_name = time ;
	datetime64[ns] time_bnds(time, bnds) ;
	float32 uo(time, lev, j, i) ;
		uo:cell_methods = time: mean ;
		uo:comment = Prognostic x-ward velocity component resolved by the model. ;
		uo:long_name = Sea Water X Velocity ;
		uo:standard_name = sea_water_x_velocity ;
		uo:units = m s-1 ;
	float64 vertices_latitude(j, i, vertices) ;
		vertices_latitude:units = degrees_north ;
	float64 vertices_longitude(j, i, vertices) ;
		vertices_longitude:units = degrees_east ;

// global attributes:
	:Conventions = CF-1.7 CMIP-6.2 ;
	:activity_id = CMIP ;
	:branch_method = standard ;
	:branch_time_in_child = 0.0 ;
	:branch_time_in_parent = 21915.0 ;
	:cmor_version = 3.4.0 ;
	:creation_date = 2019-11-15T15:41:24Z ;
	:data_specs_version = 01.00.30 ;
	:experiment = all-forcing simulation of the recent past ;
	:experiment_id = historical ;
	:forcing_index = 1 ;
	:frequency = mon ;
	:further_info_url = https://furtherinfo.es-doc.org/CMIP6.CSIRO.ACCESS-ESM1-5.historical.none.r1i1p1f1 ;
	:grid = native atmosphere N96 grid (145x192 latxlon) ;
	:grid_label = gn ;
	:history = 2019-11-15T15:41:24Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards. ;
	:initialization_index = 1 ;
	:institution = Commonwealth Scientific and Industrial Research Organisation, Aspendale, Victoria 3195, Australia ;
	:institution_id = CSIRO ;
	:license = CMIP6 model data produced by CSIRO is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/).  Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment.  Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).  The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law. ;
	:mip_era = CMIP6 ;
	:nominal_resolution = 250 km ;
	:notes = Exp: ESM-historical; Local ID: HI-05; Variable: uo (['u']) ;
	:parent_activity_id = CMIP ;
	:parent_experiment_id = piControl ;
	:parent_mip_era = CMIP6 ;
	:parent_source_id = ACCESS-ESM1-5 ;
	:parent_time_units = days since 0101-1-1 ;
	:parent_variant_label = r1i1p1f1 ;
	:physics_index = 1 ;
	:product = model-output ;
	:realization_index = 1 ;
	:realm = ocean ;
	:run_variant = forcing: GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a) ;
	:source = ACCESS-ESM1.5 (2019): 
aerosol: CLASSIC (v1.0)
atmos: HadGAM2 (r1.1, N96; 192 x 145 longitude/latitude; 38 levels; top level 39255 m)
atmosChem: none
land: CABLE2.4
landIce: none
ocean: ACCESS-OM2 (MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)
ocnBgchem: WOMBAT (same grid as ocean)
seaIce: CICE4.1 (same grid as ocean) ;
	:source_id = ACCESS-ESM1-5 ;
	:source_type = AOGCM ;
	:status = 2020-02-04;created; by gcs.cmip6.ldeo@gmail.com ;
	:sub_experiment = none ;
	:sub_experiment_id = none ;
	:table_id = Omon ;
	:table_info = Creation Date:(30 April 2019) MD5:e14f55f257cceafb2523e41244962371 ;
	:title = ACCESS-ESM1-5 output prepared for CMIP6 ;
	:tracking_id = hdl:21.14100/c2d89f4c-94e1-48f9-b1e6-050516684527
hdl:21.14100/c95088be-d897-4cbf-99a7-89c62a5349f2
hdl:21.14100/e884b2d6-fefb-457b-b6d5-b16470438ae1
hdl:21.14100/39f1dfc6-d610-4ccd-81a0-081754685c3c
hdl:21.14100/a473cac5-8853-4fb5-a3b1-7eb903da4f7f
hdl:21.14100/949cd5ca-0176-46e8-80e8-806811ed3ffa
hdl:21.14100/8a4394ec-15d3-402c-aabd-3b44bb5746e5
hdl:21.14100/021a5cd4-15f9-490c-8a0e-98237a22a5e3
hdl:21.14100/44f49869-2053-4d30-a58b-3ef777d40363
hdl:21.14100/687cffee-85d2-444b-bf43-9700f2b9bf49
hdl:21.14100/5132fa3f-5642-4049-8e4c-26a2cde3b97a
hdl:21.14100/4dad6097-0779-4031-b014-d9233207f6dc
hdl:21.14100/d13e6d17-b98e-48d3-aaa4-3162770d00ef
hdl:21.14100/c5850670-8cea-46d1-bcb2-83faecdd0161
hdl:21.14100/8df637a7-8ebc-435a-ba52-c217a1e2365c
hdl:21.14100/5506cf97-e19b-4be1-aca1-3748d792275b
hdl:21.14100/225bc0b3-2081-492d-ac6e-da9c4014c5b7 ;
	:variable_id = uo ;
	:variant_label = r1i1p1f1 ;
	:version = v20191115 ;
	:netcdf_tracking_ids = hdl:21.14100/c2d89f4c-94e1-48f9-b1e6-050516684527
hdl:21.14100/c95088be-d897-4cbf-99a7-89c62a5349f2
hdl:21.14100/e884b2d6-fefb-457b-b6d5-b16470438ae1
hdl:21.14100/39f1dfc6-d610-4ccd-81a0-081754685c3c
hdl:21.14100/a473cac5-8853-4fb5-a3b1-7eb903da4f7f
hdl:21.14100/949cd5ca-0176-46e8-80e8-806811ed3ffa
hdl:21.14100/8a4394ec-15d3-402c-aabd-3b44bb5746e5
hdl:21.14100/021a5cd4-15f9-490c-8a0e-98237a22a5e3
hdl:21.14100/44f49869-2053-4d30-a58b-3ef777d40363
hdl:21.14100/687cffee-85d2-444b-bf43-9700f2b9bf49
hdl:21.14100/5132fa3f-5642-4049-8e4c-26a2cde3b97a
hdl:21.14100/4dad6097-0779-4031-b014-d9233207f6dc
hdl:21.14100/d13e6d17-b98e-48d3-aaa4-3162770d00ef
hdl:21.14100/c5850670-8cea-46d1-bcb2-83faecdd0161
hdl:21.14100/8df637a7-8ebc-435a-ba52-c217a1e2365c
hdl:21.14100/5506cf97-e19b-4be1-aca1-3748d792275b
hdl:21.14100/225bc0b3-2081-492d-ac6e-da9c4014c5b7 ;
	:version_id = v20191115 ;
}None

Note that this dataset uses i, j as dimensions rather than x, y, but otherwise has the same structure.

To use xgcm, we need to have actual dimension coordinates corresponding to the other grid positions. Just saying this is a B-grid doesn’t give us enough information. Are the B-grid cells top right or bottom left with respect to the cell centers? :man_shrugging: I’m going to assume they are bottom left.

If we know this, we can add some dimensions and set up our xgcm Grid.

ds = (
    ds
    .rename({'i': 'i_corner', 'j': 'j_corner'})
    .assign_coords(
        {
            # Note that coordinate values here don't actually matter
            # as they are in logical grid space rather than physical space.
            # But we need some way to keep track of the fact that these
            # dimensions are _different_ from the corner dimensions
            'i_center': ds.i.values + 0.5,
            'j_center': ds.j.values + 0.5
        }
    )
)
print(ds)
<xarray.Dataset> Size: 43GB
Dimensions:             (i_corner: 360, j_corner: 300, lev: 50, bnds: 2,
                         time: 1980, vertices: 4, i_center: 360, j_center: 300)
Coordinates:
  * i_corner            (i_corner) int32 1kB 0 1 2 3 4 5 ... 355 356 357 358 359
  * j_corner            (j_corner) int32 1kB 0 1 2 3 4 5 ... 295 296 297 298 299
    latitude            (j_corner, i_corner) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * lev                 (lev) float64 400B 5.0 15.0 25.0 ... 5.499e+03 5.831e+03
    lev_bnds            (lev, bnds) float64 800B dask.array<chunksize=(50, 2), meta=np.ndarray>
    longitude           (j_corner, i_corner) float64 864kB dask.array<chunksize=(300, 360), meta=np.ndarray>
  * time                (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 20...
    time_bnds           (time, bnds) datetime64[ns] 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    vertices_latitude   (j_corner, i_corner, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
    vertices_longitude  (j_corner, i_corner, vertices) float64 3MB dask.array<chunksize=(300, 360, 4), meta=np.ndarray>
  * i_center            (i_center) float64 3kB 0.5 1.5 2.5 ... 357.5 358.5 359.5
  * j_center            (j_center) float64 2kB 0.5 1.5 2.5 ... 297.5 298.5 299.5
Dimensions without coordinates: bnds, vertices
Data variables:
    uo                  (time, lev, j_corner, i_corner) float32 43GB dask.array<chunksize=(5, 50, 300, 360), meta=np.ndarray>

Now we see that we have two sets of dimension coordinates (corner and center). From different combinations of these, we can reconstruct all of the different possible positions on an Arakawa grid. Many ocean models (e.g. MITgcm, MOM6, ROMS) include these types of dimension coordinates by default in their output. But CF conventions don’t say anything about this.

With this, we can create the xgcm.Grid

grid = xgcm.Grid(
    ds,
    coords={
        'X': {'center': 'i_center', 'left': 'i_corner'},
        'Y': {'center': 'j_center', 'left': 'j_corner'}
    },
    periodic=['X']
)
<xgcm.Grid>
X Axis (periodic, boundary=None):
  * center   i_center --> left
  * left     i_corner --> center
Y Axis (not periodic, boundary=None):
  * center   j_center --> left
  * left     j_corner --> center

Now we can use this to easily do the desired interpolation.

uo_center = grid.interp(ds.uo, axis=['X', 'Y'])
<xarray.DataArray 'uo' (time: 1980, lev: 50, j_center: 300, i_center: 360)> Size: 43GB
dask.array<transpose, shape=(1980, 50, 300, 360), dtype=float32, chunksize=(5, 50, 300, 360), chunktype=numpy.ndarray>
Coordinates:
  * lev       (lev) float64 400B 5.0 15.0 25.0 ... 5.166e+03 5.499e+03 5.831e+03
  * time      (time) datetime64[ns] 16kB 1850-01-16T12:00:00 ... 2014-12-16T1...
  * i_center  (i_center) float64 3kB 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
  * j_center  (j_center) float64 2kB 0.5 1.5 2.5 3.5 ... 296.5 297.5 298.5 299.5

This is typical of xgcm usage. We spend a lot of time setting up the proper metadata and coordinate system. Then the operation itself is trivial.

If you want to get the coordinates of those points, you could try interpolating latitude and longitude the same way. But will they line up with the vertices variables? Hard to say…

2 Likes

Wow! Thank you so much for the detailed response and hand-holding!

So you are manually constructing the cell centers in some sense above. Is it possible to automate this process further by using another variable that is already centered? E.g., if I also loaded areacello (or thetao or another variable that I know lives at the grid cell centers), then xgcm should be able to match uo’ s (lon, lat) with one of areacello’s centers, face centers, edge centers, or edge corners, thus determining both the grid type (A, B, C, etc.) and the orientation (B-grid top right, B-grid bottom left, etc.) right?

So ideally, I’m looking for some interface that would look like

xgcm.interp(uo, areacello, grid_type='C', position='u')

where we interpolate uo onto the grid of areacello at the C-grid u position, if that makes sense.

PS1: BTW I hope vertex order will be resolved soon by xmip; see e.g., related xmip issue #368 and PR #370.

PS2: I’m not entirely sure but the difference that you noted about the index names ((i,j) vs (x,y)) may just have been caused by preprocessing with xmip.

This sounds like a very difficult and error-prone procedure. You’re asking to examine different variables from different datasets and, using only the coordinate information, somehow reconstruct the fact that they are both part of the same staggered Arakawa grid system. This will be quite sensitive to numerical tolerance issues. For example, you’ll have to answer what does it mean for a coordinate position to be “in the middle” between two other coordinates? We would need to distinguish between actual Arakawa grids and actually different GCMs that have similar but not identical grids.

It would be so much easier if the dataset would just declare this information explicitly, e.g. using the SGRID conventions (which xgcm understands, and if set, can automatically build a grid for.). After all, the model developers know that they are using an Arakawa grid! It’s fundamental to all of the algorithms. This metadata belongs in the dataset itself, but CF doesn’t give a way to do that.

So you are stuck relying on “extra” information, such as your prior knowledge about the grid.

This is an easier ask. It would be possible to create an API like this for xgcm. It would involve automatically generating a Grid object from uo (only possible with sufficient metadata, as described above), defining what you mean by grid_type='C' and position='u', and them wrapping the Grid.interp operation.

We would welcome a PR to xgcm to add that sort of functionality.

1 Like

Hey folks,

Couple of thoughts from my end:

Is it possible to automate this process further by using another variable that is already centered? E.g., if I also loaded areacello (or thetao or another variable that I know lives at the grid cell centers), then xgcm should be able to match uo’ s (lon, lat) with one of areacello’s centers, face centers, edge centers, or edge corners, thus determining both the grid type (A, B, C, etc.) and the orientation (B-grid top right, B-grid bottom left, etc.) right?

That is essentially what I tried to do in xMIP/xmip/grids.py at main · jbusecke/xMIP · GitHub, but this could definitely use more work.

I agree 10000% with Ryans comments that this information should be included as metadata, it would make the logic above a lot more robust and accurate.

I started collecting the ‘real’ grid data for various models by emailing folks I know Collecting grid-metric files for CMIP6 output for cloud analysis - #11 by emaroon but I never got around to putting this data up. Having the original grid information would solve this problem if we can assume that all the outputs are actually unaltered. But I am not sure that is the case. I have encountered examples where I had the original grid file, e.g. indicating that a B grid with velocities in the lower left corner was used, but diagnosing from the provided lon/lat values told me that the position is in the upper/right corner or on the center. It is hard to tell whether this is an error with the provided coordinates, or if the data has actually interpolated - I think in some cases this is done, but again I know of no way to tell this from the data and I have not found any reliable documentation of this anywhere.

After all, the model developers know that they are using an Arakawa grid! It’s fundamental to all of the algorithms. This metadata belongs in the dataset itself, but CF doesn’t give a way to do that.

If I ever find a minute I would love to retrofit this in some way shape or form, but I think an important thing to do here from the community is to flag this as an issue to be addressed for CMIP7! I encourage you to contact the Data Access Task Team and describe this as a blocking issue.

This is an easier ask. It would be possible to create an API like this for xgcm. It would involve automatically generating a Grid object from uo (only possible with sufficient metadata, as described above), defining what you mean by grid_type='C' and position='u', and them wrapping the Grid.interp operation.

We would welcome a PR to xgcm to add that sort of functionality.

Unless I misunderstand this ask, you can already use grid.interp_like for this?

1 Like