Hi,
One issue with the xarray open_mfdataset function is that it’s slow when loading a large number of grib files. Files can also contain additional grib messages that are not required. I’m aware that I could use kerchunk to scan the grib files and create a json representation of a zarr array allowing one to combine multiple files/messages with filtering. However, I already a large number of grib files scanned with the following information:
- file
- message_start_bytes
- message_length_bytes
along with additional attributes that allows me to filter by variable etc.
My thinking is that this information should be sufficient to create a fsspec ReferenceFileSystem so that I have one file with many grib messages instead of many files. I have a simple example for a local file system below that fails:
import xarray as xr
from fsspec.implementations.reference import ReferenceFileSystem
fs = ReferenceFileSystem(
{
"key1": [
"file1.grib2",
0,
1000,
],
"key2": [
"file2.grib2",
1001,
1000,
],
}
)
m = fs.get_mapper("")
ds = xr.open_mfdataset(m, engine="cfgrib", indexpath="")
Any thoughts on what I might be doing wrong here are welcome. Alternatively, please do let me know if you think there’s a better approach to combining the filename and grib message byte range information into a single file.