Reading ERA5 data on Planetary Computer

I’ve started reading ERA5 data on the Planetary Computer. I first tried using stac_load as I’ve used with other collection items but kept getting an empty dataset out. Trying to then follow the example notebook explicitly, I did replicate its output. I note it uses the now deprecated .get_all_items() method, and I like to try to use updated API calls where possible, but the updated method returns a generator so you can’t access with item = items[0] and I wanted to repeat the example notebook as faithfully as possible in the first case. But anyway, I did get it to work.

The example picks the first item with item = items[0] and then iterates through the assets to then combine them into a Dataset of multiple variables. This feels kinda clunky. Is this really the best approach?

Noting that this was just for one month/item, for multiple months would you have to iterate over each item and then each asset and then .combine_by_coords?

The **asset.extra_fields["xarray:open_kwargs"] parameter seems crucial to success. Looking at the item, I can see this attribute for each asset thus:
{'xarray:open_kwargs': {'chunks': {}, 'engine': 'zarr', 'consolidated': True, 'storage_options': {'account_name': 'cpdataeuwest', 'credential': 'blahblahblah'}}}. I did wonder how it knew to automatically return dask arrays. But this feels like a “not very stac like” undocumented means of access that you have to kinda know about?

Aside from any specific questions above, I’m left with general musings such as:

  • Does the example use xarray’s open_dataset specifically because of the presence of this magic field?
  • Is this magic field there just to support direct open_dataset access?
  • Does stac_load not support zarr?
  • Can the data be loaded using stac_load, or is it “Nope, you’ve got to use xarray’s open_dataset directly”?
  • Any recommended reading to learn more about this open_dataset access pattern that makes use of this extra_fields parameter that’s stored in data?

Pinging @TomAugspurger (sorry :wink: ) for much valued MSPC viewpoint!

Huh, I really thought I had updated all those notebooks, but apparently not. The equivalent method call would be either next(search.items()) or search.item_collection()[0]. I’ll get to that sooner or later.

The example picks the first item with item = items[0] and then iterates through the assets to then combine them into a Dataset of multiple variables. This feels kinda clunky. Is this really the best approach?

For now, yes. But see Convenience methods for converting STAC objects / linked assets to data containers · Issue #846 · stac-utils/pystac · GitHub and linked issues / repos.

Noting that this was just for one month/item, for multiple months would you have to iterate over each item and then each asset and then .combine_by_coords?

I think so, yes. These are separate Zarr datasets, and I think that’s the preferred way to combine them.

But this feels like a “not very stac like” undocumented means of access that you have to kinda know about?

@jsignell’s xpystac library helps with this, I think. It’ll be something like xr.open_datset(stac_asset), and it’ll take care of inferring the right engine (based on the media type in the STAC metadata). As for the storage_options, it’s kind of unavoidable. adlfs needs an account name, and this is a private storage container so you need a credential. That either goes in the STAC metadata (like account_name), gets injected by another library (like planetary_computer.sign injecting credential), or is specified by the user.

So I wouldn’t really call it “magic”. It’s simply the keyword arguments you would need to manually supply to read the data.

  • Does stac_load not support zarr?

Is this from odc.stac? I don’t think it supports zarr

  • Any recommended reading to learn more about this open_dataset access pattern that makes use of this extra_fields parameter that’s stored in data?

The STAC extension is at GitHub - stac-extensions/xarray-assets: This extension helps users open STAC Assets with xarray. It gives a place for catalog maintainers to specify various required or recommended options..


And just a general FYI about Planetary Computer, that dataset is currently not updating. I have a task item to get it moved to a new pipeline, but ran into some issues and haven’t gotten back to it.

1 Like

Super helpful reply, thanks @TomAugspurger ! I found the stac-extensions link particularly helpful.

I’d noticed the dataset only seems to run up to the end of 2020. I thought I’d seen another source suggest it only ran up to 2020 (despite ERA5 supposedly being up to present day). Is there a timescale for fixing this?

Does anyone know of ERA5 hosted on a STAC API anywhere else?
(Also I’ll take recommendations for such sources of weather/climate data for UK, with emphasis on precipitation, but also insolation, wind, and temperature).