I successfully loaded the one file following your suggestion! thanks!
I am now using the GCS deployment, which has
fsspec.__version__ '2021.11.1'
but didn’t quite work:
TLDR:
the json file is not created correctly
rpath ='jsonfiles1/acpcp_sfc_2000010100_c00.json'
s_opts = {'requester_pays':True, 'skip_instance_cache':True}
r_opts = {'anon':True}
with fsspec.open(rpath) as f:
references = ujson.loads(f.read())
ds = xr.open_dataset("reference://", engine="zarr",
backend_kwargs={
"consolidated": False,
"storage_options": dict(fo=references, ref_storage_args=s_opts, remote_protocol="s3",
remote_options=r_opts, skip_instance_cache=True)
}
)
ds
but the dataset is wrong because the json file - i believe - is wrong
Note how step, time, and valid time have become variables and not coordinates, and more importantly, it loaded only one of the values along the dimension step there should be 80.
This is how the actual file looks if I download it:
!wget https://noaa-gefs-retrospective.s3.amazonaws.com/GEFSv12/reforecast/2000/2000010100/c00/Days:1-10/acpcp_sfc_2000010100_c00.grib2
ds1 = xr.open_dataset('acpcp_sfc_2000010100_c00.grib2', engine = 'cfgrib')
When I create the json files I define step as a common_vars various coordinate values, like:
so = {"anon": True, "default_cache_type": "readahead"}
out = scan_grib(files[0],common_vars = ['time', 'step', 'latitude', 'longitude', 'valid_time'], storage_options=so )
outfname = 'jsonfiles1/'+files[0].split('/')[-1][:-6]+'.json'
with open(outfname, "w") as f:
f.write(json.dumps(out))
however when I inspect the json file I indeed have empty
chunks
ARRAY_DIMENSIONS under refs for time and step (see blue squares) compared to latitude (see green squares).
But also step/0 is empty, and other relevant fields.
Does this have to do - probably - with how this grib file is defined?
In the example linked above (the HRRR file) I didn’t pay attention to that, because it concatenates them along another dimension, but it looks like the step variable is equal to 1 so you don’t notice that.
I in fact downloaded one HRRR file and when I loaded with cfgrib the variable step had only one entry, which is very common for real time grib files.
I thought it was worth to mention it, FYI, and to know if there are ways to create the json correctly.
I tried to hack it but i am missing the
"step/0": "\u0000\u0000\u0000\u0000\u0000\u0000\b@"
which instead for latitutde has something like:
"latitude/0": ["{{u}}", 0, 325317],
I should add - that real time forecast products usually are organized like HRRR, which means they have one step per grib files. These are reforecast products (retrospective runs, used to create a reanalysis to calibrate realtime forecast), which are always a different beast, and the files can be ad-hoc. So probably the scan_grib module doesn’t know what to do with whatever way the step variable is presented in this grib file.
thanks for your time!


