Lat/Lon grid mismatch in variables from the same CMIP6 model

Hi,

I’m working on some calculations involving precipitation and omega using CMIP6 models. However, I’ve encountered an issue with some models, such as ‘ACCESS-ESM1-5’, where the latitude and longitude grids for precipitation and omega don’t match.

I need both precipitation and omega to be on the same grid. Is it okay to reindex one of the variable grids using rearest method?
Has anyone faced a similar issue, or can anyone suggest a solution?

Thanks in advance!

I wouldn’t use nearest neighbor for this. Within models, variables are regridded from corners, or edges to center and vice-versa all the time, and from what I’ve seen in another model, this is done using finite differences. I think I’d try first with bilinear interpolation to interpolate omega onto the precip grid, because I expect omega to be smoother and less susceptible to interpolation errors than precip. Others will probably have more informed opinions on this.

1 Like

Hi @chaithra – I agree with @huard’s rationale to remap omega to the precipitation grid. You could use xESMF or xcdat (xcdat uses either regrid2 or xesmf under the hood) for remapping (these both work with xarray). They have bilinear interpolation (and can handle circular coordinate systems) or you could also consider conservative regridding if you wanted to regrid precipitation and make sure that you conserved total precipitation globally. Others might have additional regridder suggestions.

1 Like

Hi @chaithra,

I think @huard is on the right track here. These variables seem to be on different grid positions (cell face vs center). Remapping would probably work, but is expensive. You could also try to use xgcm to interpolate between grid positions. But to verify the assumption I would suggest to check that the lat/lon values of one variable are captured by the lat_bounds/lon_bounds of the other. If true I would assume that no additional interpolation was carried out and the variables are simply output on different grid positions. A simple linear interpolation on the logically rectangular grid could thus be a good way to co-locate the variables.

On a more general note this sort of thing is (to my knowledge) totally undocumented for CMIP data, which is pretty bad, since it leaves the detective work to the user!

2 Likes

Thanks!!. @huard @jbusecke @pochedley
This helps a lot.

The CF Conventions have recently be expanded to include UGRID Conventions for describing unstructured grid meshes in detail: NetCDF Climate and Forecast (CF) Metadata Conventions

Unfortunately they skipped over the more common scenario of staggered quad grids! :man_facepalming: AFAIK there is still no solution in CF conventions for how to encode this sort of information into the files.

3 Likes

Not true! SGRID Conventions (v0.3) | Staggered Grid data model (SGRID) exists though I don’t know that it’s been elevated to the CF level. ROMS has been writing these attributes for quite a while. And cf-xarray can parse them: SGRID / UGRID - cf_xarray documentation

I absolutely know that SGRID conventions exist

this is what I am saying.

I tried to convince the CF people to incorporate SGRID as well as UGRID, but they didn’t go for it

2 Likes

For ACCESS models you’re more than welcome to hop on over to the ACCESS Hive Forum and see if someone else has already discussed that issue:

Or sign up and ask for more information, though I think the suggestion about staggered grids is likely correct, but I personally don’t know much about the atmospheric fields.

Working on a related issue, but noticing differences at 14 decimals in latitude values, which causes some issues in xarray.

Looking at the following files from the AWS Zarr stores, the ps, huss, and tasmax variables can all be merged into an xarray.Dataset object without issue. However, when I try to merge in the orog data, there are very small differences in the latitude values, that cause ‘extra’ latitudes to be added. I got around this by using the compat='override', join='override options, but it seems like all these variables are on the same grid and should just merge together seemlessly.

activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
219550 CMIP MPI-M MPI-ESM1-2-LR historical r1i1p1f1 CFday ps gn s3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/CFday/ps/gn/v20190710/ nan 20190710
228470 CMIP MPI-M MPI-ESM1-2-LR historical r1i1p1f1 day huss gn s3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/day/huss/gn/v20190710/ nan 20190710
229570 CMIP MPI-M MPI-ESM1-2-LR historical r1i1p1f1 day tasmax gn s3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/day/tasmax/gn/v20190710/ nan 20190710
229614 CMIP MPI-M MPI-ESM1-2-LR historical r1i1p1f1 fx orog gn s3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/fx/orog/gn/v20190710/ nan 20190710

Minimal working example

import xarray as xr

paths = [
    's3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/CFday/ps/gn/v20190710/',
    's3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/day/huss/gn/v20190710/',
    's3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/day/tasmax/gn/v20190710/',
    's3://cmip6-pds/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/fx/orog/gn/v20190710/',
]

hist_ds = xr.open_mfdataset(
    paths,
    compat='override',
    # join='override',  # Uncomment to 'fix' the issue
    engine='zarr',
    storage_options={'anon': True},
)

hist_ds.isel(time=0)['ps'].plot()

When opening without the join='override' key in xarray.open_mfdataset, the following is produced when plotting due to the ever-so-slightly different latitude values.

@kwodzicki This is unfortunately a common issue encountered with GCM outputs of all sorts. I have seen this sort of thing many times with non-CMIP models. I think your solution is perfectly fine here.

Just wanted to point out that I have a bunch of functionality in [xMIP](xMIP/xmip/postprocessing.py at main · jbusecke/xMIP · GitHub) that helps with combining datasets in different ways, particularly when you want to do this across different models. This might need some additional logic for your workflow, so the easiest solution is probably just sticking with what you have here hehe.