FSTimeoutError with various notebooks using NOAA grib2 data

Hi,

I am looking for a kerchunk example with grib2 files. However I keep getting FSTimeoutError errors when running notebooks pointing to NOAA grib2 data on S3. Last in line is GEFS Kerchunk · GitHub. In that particular instance, I get this sequence :

The link in AioReadTimeoutError downloads just fine when I click on it. Hints anyone ?

TIA,

Yves

Apologies if this is not helpful, but I suspect fsspec may be to blame here. I’ve often hit similar timeouts there. Since these URLs are public anyway we have had better luck accessing them directly over http, and/or using the gdal virtual filesystem interface (either over S3 or http).

2 Likes

Thank you! Here’s a change that worked this morning :
< fs_s3 = fsspec.filesystem(‘s3’, anon = True)

> from s3fs import S3FileSystem
> fs_s3 = S3FileSystem()

No idea whether this is due to today being another day but I could run the GEFS notebook fine.

There was an intermittent problem with kerchunked grib files that was fixed in kerchunk=0.1.1, released about 10 days ago on conda-forge.

Here’s a notebook for GEFS that combines along the forecast dates (as well as the dimensions in the notebook of @Peter_Marsh) and parallelizes the various combine steps: https://nbviewer.org/gist/rsignell-usgs/ce2c9faeeb006bbd189a8818ffadb133

Here is a snapshot of the result (and a graphic just to show it’s working):

That’s a pretty hefty notebook @rsignell. Thank you !

I updated my conda environment packages (I had kerchunk v0.1.1 already) and your notebook ran but I still have a couple of issues with Dask (that I’ll have to get better acquainted with first). Probably has a lot to do with my laptop being 10 ys old with 8 GB RAM though. I could see tons of messages about memory use that hint at my machine being not ready for this. Or I have to play with # workers.

What I’m really after is to use kerchunk as part of a workflow that requires files be generated in grib2 format so I figured generating kerchunk “indexes” at the same time would be an idea to explore. We’re talking HPC so tons of cores. Does that sound like a reasonable idea ?

Thanx again. Your notebook is going to be reference material for me.

P.S. I always stumble when installing cartopy in a conda-environment. What a PITA :slight_smile: