ERA5, longitude, antimeridans, and stac search API

Keywords: ERA5, microsoft planetary computer, antimeridian, xarray, sel, sortby, stac search API

This post is motivated by playing around with the ERA5 data as hosted on Microsoft’s Planetary Computer.
I note that longitude in this data is over the range [0, 360), with 0 being the Greenwich Prime Meridian.
I also note that the STAC API for the bbox query parameter states

The coordinate reference system of the values is WGS 84 longitude/latitude (http://www.opengis.net/def/crs/OGC/1.3/CRS84)

Now, as I understand it, WGS84 is not a CRS per se, but describes the shape of the geoid upon which a particular CRS is based? Anyhoo, the STAC API documentation goes on to say

For WGS 84 longitude/latitude the values are in most cases the sequence
of minimum longitude, minimum latitude, maximum longitude and maximum
latitude. However, in cases where the box spans the antimeridian the
first value (west-most box edge) is larger than the third value
(east-most box edge).

So, here we’re effectively dealing with a dataset that plonks a/the antimeridian smack through the UK, which is kinda inconvenient for those of us who live here as opposed to vast swaths of the Pacific Ocean.

All of which is preamble, and arguably not of major relevance to the ERA5 data that is organised with one file for the whole Earth’s surface at a moment in time (as opposed to needing to locate which tiles/granules you need). Indeed, using bbox in the stac search here is pointless. This was my first learning. (Note, it wasn’t my first learning in time, but it’s worth sorting linearly with respect to the narrative here).

So I search for available data for my year of interest and don’t bother with the bbox parameter in the catalog search:

search = catalog.search(collections=['era5-pds'], datetime='2020', query={"era5:kind": {"eq": "fc"}})
datasets = [
    xr.open_dataset(asset.href, **asset.extra_fields["xarray:open_kwargs"])
    for item in search.items() for asset in item.assets.values() 
]

ds = xr.combine_by_coords(datasets, join="exact")
ds

and get


Now, this appeals to me because I’ve got an xarray Dataset comprising dask.arrays of various interesting variables. But note the lon goes from 0.0 to 359.8. I want to visualize Great Britain (sorry, N. Ireland) and xarray’s .sel() selector won’t give me what I want here. My next step is to transform those western longitudes:

ds['lon'] = ds.lon.where(ds.lon < 180, ds.lon - 360)
ds = ds.sortby('lon')
ds.lon


Okay, so now I have longitude running [-180, 180) with UK running from negative (west) to positive (east) and I can access using .sel() and visualize:

uk_era5 = ds.sel(lat=slice(60, 50), lon=slice(-6, 2))
ax = uk_era5.air_temperature_at_2_metres_1hour_Maximum.groupby('time.month').min().sel(month=1).plot()
eng.plot(ax=ax.axes, facecolor='None')

gb_era5
which looks like I have the land in the right place.

Rounding off this post with my learnings/questions:

  1. Obviously, when accessing someone’s STAC API, you can’t change stuff server side, so when querying over the antimeridian, do you just need to break your query into two?
  2. And you need to know the coordinate system of the dimensions, because relying on the data complying with the STAC API and being ‘WGS84’ isn’t enough, because, like here, you can’t assume longitude runs [-180, 180)?
  3. Is it a good approach to transform the dimension coordinates (here to have negative values for the western hemisphere) and sort to ensure the coordinates monotonically increase?

Gently and politely pondering in the general direction of @TomAugspurger to ask whether they considered doing a coordinate (longitude) transformation of the data when ingesting? Because it strikes me as slightly odd that the STAC API seems silent on the actual CRS used. What’s to stop someone going “Well our data is referenced to the longitude of Time Square, New York, but it’s still on WGS84”? Is this a pointless question because in such cases where this matters to what you get back from the STAC API search (i.e. when the Earth’s surface is tessellated by tiles/granules) you’ll always be dealing with a longitude that runs [-180, 180)? Or do you just simply need to know the CRS of the data you’re querying in order to adjust or transform the bounds of your query parameters before making the call to search the catalog?

Not going to comment on the majority of you questions but note you may get tiny speed up when transforming the longitude using assign_coords instead of where e.g. ds.assign_coords(lon=(((ds.lon + 180) % 360) - 180)) (xarray/xarray/core/common.py at main · pydata/xarray · GitHub)

1 Like

Yes, per the specification antimeridian-crossing geometries should be split on the antimeridian. E.g. https://github.com/gadomski/antimeridian/blob/main/tests/data/output/split.json.

And you need to know the coordinate system of the dimensions, because relying on the data complying with the STAC API and being ‘WGS84’ isn’t enough, because, like here, you can’t assume longitude runs [-180, 180)?

STAC APIs should expect and accept “proper” WGS84, i.e. -180° to 180°. A quick look at a sample item shows that it’s a global dataset, so a bbox query isn’t going to get you much in this case, anyways.

Note that this only applies to the STAC items themselves – in many cases, the actual data are in a different coordinate system, forcing you to to the dance you showed.

Because it strikes me as slightly odd that the STAC API seems silent on the actual CRS used.

Many STAC items use the projection extension to provide information on the coordianate reference system (and data shape) of the underlying data, but doesn’t look like that the ERA5 collection includes proj information.

considered doing a coordinate (longitude) transformation of the data when ingesting?

This is a possibility, but in general the data on the Planetary Computer isn’t transformed in any way, other than to convert to cloud-optimized formats when appropriate. It’s a bit of a can of worms if you start “correcting” every dataset, if you’re hosting tens to hundreds of them.

1 Like