Metadata rich hydrodynamic/atmospheric data following SGRID conventions - tooling and existing datasets

We’re currently doing major version (v4) development of Parcels and we want to rework our ingestion of structured grid, hydrodynamic input data. Throughout the history of Parcels with struggled with users coming with different data from various different models and bringing that into a format that parcels can understand. Previous versions of Parcels didn’t heavily rely on metadata provided with datasets.

We’re looking to have v4 of Parcels flip this and only work from CF and SGRID compliant datasets[1] (providing helpers to generate these sort of datasets, alongside tutorials for users to create such metadata rich datasets). This simplifies our lives as developers/maintainers, prevents more runtime failures, and also helps users avoid unknowingly make scientific errors.

My questions are as follows:

  1. Is there existing tooling for ingesting various model output data (perhaps scattered across multiple netcdf files) and creating metadata rich xarray datasets (following SGRID conventions)?
  2. Are there hydrodynamic datasets that cannot be represented in a single metadata rich xarray dataset? NEMO output, although output in different NetCDF files, can be represented in a single xarray dataset by renaming dimensions - just wondering if there are model outputs I’m not considering.
  3. Does anyone know of online SGRID compliant metadata rich datasets? (preferably something in a bucket that I can just open and look at metadata from? I really just want good data to test with)
  4. Tangential: When it comes to atmospheric model output, do they also use Arakawa staggering and SGRID conventions?
  5. Any other tips? :slight_smile:

  1. Thanks v0.9.0 of xgcm which helps here! ↩︎

2 Likes

We wrestled with this problem in xgcm

SGRID never seems to have caught on like UGRID. Unfortunately AFAIK there is really no widely used standard to describe typical staggered grids. CF does not. We built on something called “comodo conventions” which seems to be unmaintained.

1 Like

creating metadata rich xarray datasets (following SGRID conventions)?

The SGRID conventions simply require adding one variable with attribute cf_role: grid_topology and the SGRID metadata on that variable. Technically you are also supposed to add SGRID vXX to the dataset’s Conventions attribute IIRC. For a given model these attribute values are constant, so what you need is a database.

Does anyone know of online SGRID compliant metadata rich datasets?

ROMS has written SGRID-compliant metadata by default for many years now (example metadata). I’m sure one of the NOAA ocean forecast data streams will be a useful example.

Are there hydrodynamic datasets that cannot be represented in a single metadata rich xarray dataset?

Generally speaking, no. Because at this point most models have adapted to netCDF, and xarray is heavily inspired by netCDF.

Any other tips?

The 2025 CF Workshop has a SGRID hackathon led by @Chris_Barker_NOAA . Sounds like you should attend!

1 Like