Reading ERA5 data on Planetary Computer

Guy_Maskall · June 7, 2023, 5:33pm

I’ve started reading ERA5 data on the Planetary Computer. I first tried using stac_load as I’ve used with other collection items but kept getting an empty dataset out. Trying to then follow the example notebook explicitly, I did replicate its output. I note it uses the now deprecated .get_all_items() method, and I like to try to use updated API calls where possible, but the updated method returns a generator so you can’t access with item = items[0] and I wanted to repeat the example notebook as faithfully as possible in the first case. But anyway, I did get it to work.

The example picks the first item with item = items[0] and then iterates through the assets to then combine them into a Dataset of multiple variables. This feels kinda clunky. Is this really the best approach?

Noting that this was just for one month/item, for multiple months would you have to iterate over each item and then each asset and then .combine_by_coords?

The **asset.extra_fields["xarray:open_kwargs"] parameter seems crucial to success. Looking at the item, I can see this attribute for each asset thus:
{'xarray:open_kwargs': {'chunks': {}, 'engine': 'zarr', 'consolidated': True, 'storage_options': {'account_name': 'cpdataeuwest', 'credential': 'blahblahblah'}}}. I did wonder how it knew to automatically return dask arrays. But this feels like a “not very stac like” undocumented means of access that you have to kinda know about?

Aside from any specific questions above, I’m left with general musings such as:

Does the example use xarray’s open_dataset specifically because of the presence of this magic field?
Is this magic field there just to support direct open_dataset access?
Does stac_load not support zarr?
Can the data be loaded using stac_load, or is it “Nope, you’ve got to use xarray’s open_dataset directly”?
Any recommended reading to learn more about this open_dataset access pattern that makes use of this extra_fields parameter that’s stored in data?

Pinging @TomAugspurger (sorry ) for much valued MSPC viewpoint!

TomAugspurger · June 7, 2023, 6:26pm

Huh, I really thought I had updated all those notebooks, but apparently not. The equivalent method call would be either next(search.items()) or search.item_collection()[0]. I’ll get to that sooner or later.

The example picks the first item with item = items[0] and then iterates through the assets to then combine them into a Dataset of multiple variables. This feels kinda clunky. Is this really the best approach?

For now, yes. But see Convenience methods for converting STAC objects / linked assets to data containers · Issue #846 · stac-utils/pystac · GitHub and linked issues / repos.

Noting that this was just for one month/item, for multiple months would you have to iterate over each item and then each asset and then .combine_by_coords?

I think so, yes. These are separate Zarr datasets, and I think that’s the preferred way to combine them.

But this feels like a “not very stac like” undocumented means of access that you have to kinda know about?

@jsignell’s xpystac library helps with this, I think. It’ll be something like xr.open_datset(stac_asset), and it’ll take care of inferring the right engine (based on the media type in the STAC metadata). As for the storage_options, it’s kind of unavoidable. adlfs needs an account name, and this is a private storage container so you need a credential. That either goes in the STAC metadata (like account_name), gets injected by another library (like planetary_computer.sign injecting credential), or is specified by the user.

So I wouldn’t really call it “magic”. It’s simply the keyword arguments you would need to manually supply to read the data.

Does stac_load not support zarr?

Is this from odc.stac? I don’t think it supports zarr

Any recommended reading to learn more about this open_dataset access pattern that makes use of this extra_fields parameter that’s stored in data?

The STAC extension is at GitHub - stac-extensions/xarray-assets: This extension helps users open STAC Assets with xarray. It gives a place for catalog maintainers to specify various required or recommended options..

And just a general FYI about Planetary Computer, that dataset is currently not updating. I have a task item to get it moved to a new pipeline, but ran into some issues and haven’t gotten back to it.

Guy_Maskall · June 8, 2023, 11:43am

Super helpful reply, thanks @TomAugspurger ! I found the stac-extensions link particularly helpful.

I’d noticed the dataset only seems to run up to the end of 2020. I thought I’d seen another source suggest it only ran up to 2020 (despite ERA5 supposedly being up to present day). Is there a timescale for fixing this?

Does anyone know of ERA5 hosted on a STAC API anywhere else?
(Also I’ll take recommendations for such sources of weather/climate data for UK, with emphasis on precipitation, but also insolation, wind, and temperature).

Michael_Sumner · November 10, 2023, 5:23am

just here to ask the same questions, what is it with MPC only going up to 2020? it’s the same with GHRSST

TomAugspurger · November 10, 2023, 12:39pm

As I mentioned, I have a task to update the data pipeline but (still) haven’t gotten to it. There were some complications with small floating-point differences from what we already have that I haven’t tracked down.

Michael_Sumner · November 10, 2023, 9:49pm

gosh I missed that, apologies. There’s so much celebration about this being the ultimate way to do things I’m sorry to hear that it really comes down to one person with a todo list for years-out-of-date standard products. I hope there is support coming? Is Microsoft actually contributing to this in a real way?

I have pipelined reprocessing GHRSST into COG and with help from a cloud expert we are exploring pushing that up to source.coop … to me that’s a better solution than Zarr (which doesn’t have overviews afaict), so I wonder if we should discuss the issues a bit more directly. I was a bit surprised to find ERA5 also not up to date, and I had diverted to accessing that on our HPC system rather than MPC.

(edit: oh I see you work for microsoft, ok I need to get across the issues here more )

Topic		Replies	Views
Accessing Metadata as Variables in Xarray Datasets Using odc-stac? Data	4	192	September 4, 2024
STAC -> Xarray Loading Best Practices Data	3	548	March 13, 2024
How to build a STAC out of ERA5-land data? Data	3	131	May 28, 2025
Any suggestions for efficiently operating over windows of data? Data	4	1193	February 2, 2023
STAC and Earth Systems datasets Data	23	4879	October 24, 2022

Reading ERA5 data on Planetary Computer

Related topics