FSTimeoutError with various notebooks using NOAA grib2 data

Yves_Moisan · April 25, 2023, 9:08pm

Hi,

I am looking for a kerchunk example with grib2 files. However I keep getting FSTimeoutError errors when running notebooks pointing to NOAA grib2 data on S3. Last in line is GEFS Kerchunk · GitHub. In that particular instance, I get this sequence :

ServerTimeoutError: : Timeout on reading data from socket
AioReadTimeoutError: Read timeout on endpoint URL: “https://noaa-gefs-pds.s3.us-east-1.amazonaws.com/gefs.20230425/12/atmos/pgrb2ap5/gep01.t12z.pgrb2a.0p50.f000”
FSTimeoutError: (with no explanation string)

The link in AioReadTimeoutError downloads just fine when I click on it. Hints anyone ?

TIA,

Yves

cboettig · April 26, 2023, 2:49am

Apologies if this is not helpful, but I suspect fsspec may be to blame here. I’ve often hit similar timeouts there. Since these URLs are public anyway we have had better luck accessing them directly over http, and/or using the gdal virtual filesystem interface (either over S3 or http).

Yves_Moisan · April 26, 2023, 3:30pm

Thank you! Here’s a change that worked this morning :
< fs_s3 = fsspec.filesystem(‘s3’, anon = True)

> from s3fs import S3FileSystem
> fs_s3 = S3FileSystem()

No idea whether this is due to today being another day but I could run the GEFS notebook fine.

rsignell · April 26, 2023, 6:23pm

There was an intermittent problem with kerchunked grib files that was fixed in kerchunk=0.1.1, released about 10 days ago on conda-forge.

Here’s a notebook for GEFS that combines along the forecast dates (as well as the dimensions in the notebook of @Peter_Marsh) and parallelizes the various combine steps: https://nbviewer.org/gist/rsignell-usgs/ce2c9faeeb006bbd189a8818ffadb133

Here is a snapshot of the result (and a graphic just to show it’s working):

Yves_Moisan · April 28, 2023, 12:14pm

That’s a pretty hefty notebook @rsignell. Thank you !

I updated my conda environment packages (I had kerchunk v0.1.1 already) and your notebook ran but I still have a couple of issues with Dask (that I’ll have to get better acquainted with first). Probably has a lot to do with my laptop being 10 ys old with 8 GB RAM though. I could see tons of messages about memory use that hint at my machine being not ready for this. Or I have to play with # workers.

What I’m really after is to use kerchunk as part of a workflow that requires files be generated in grib2 format so I figured generating kerchunk “indexes” at the same time would be an idea to explore. We’re talking HPC so tons of cores. Does that sound like a reasonable idea ?

Thanx again. Your notebook is going to be reference material for me.

P.S. I always stumble when installing cartopy in a conda-environment. What a PITA

Topic		Replies	Views
Issue accessing cloud GFS data using kerchunk Cloud	2	591	February 2, 2023
Accessing GRIB2 files as a single cloud-friendly dataset in xarray through kerchunk Data	15	3207	October 28, 2022
Spatially un-chunked grib2 use case : can I do something with/before Kerchunk? Data	5	598	May 9, 2023
September 21th 2022: Accessing NetCDF and GRIB file collections as cloud-native virtual datasets using Kerchunk Pangeo Showcase	0	1194	September 19, 2022
Trick for improving Kerchunk performance for large numbers of chunks/files Data	11	1577	February 2, 2023

FSTimeoutError with various notebooks using NOAA grib2 data

Related topics