Hi @kdl0013 thanks for the reminder about climpred! I had forgotten about that project and it was interesting to explore what’s happened since I last checked it out.
Climpred has made things work by reshaping forecasts into init
(forecast_reference_time
, the time the forecast was ‘made’) and lead
(integer offset with units encoded as an attribute) dimensions so that they can be aggregated into a single dataset.
So far, I’ve been largely approaching this problem with my NERACOOS data manager hat on. Most of our users (fishermen, Coast Guard, local communities) really just want the best available data and aren’t worried about individual model runs, but I’ve also got a handful of scientists who are interested in those. We are currently running ERDDAP and THREDDS servers, but don’t have full THREDDS FMRC serving set up (instead I’ve got something manually hacked together with separate datasets for each model run).
For us (and I’m guessing a lot of other orgs), it probably doesn’t make too much sense to change our model storage to be init
x lead
shaped with our normal usage patterns.
Since there is enough information to get init
x lead
(even with some changes I’m pondering below), I do think it would be reasonable to provide a method that could return a tree as a climpred compatible dataset.
Right now I’m capturing forecast_reference_time
(or init
) as a dimension and forecast_period
(or lead
) as a non-dimension coordinate (forecast_offset
) by adding them to each dataset within the tree. forecast_reference_time
is also captured in the path within the tree (model_run/{forecast_reference_time}
). I’m also making a dataset at the root with that info.
Looking at how Kerchunk wants to do aggregations, it probably makes the most sense to avoid doing any reshaping of datasets with extra data. xarray-fmrc
could depend on just the forecast_reference_time
in the path and derive all the related info as needed for the reshaping it’s doing (well, as long as there is a reasonable time dimension, but I think even that is manageable). That way Kerchunk can refer to existing model data as is and we can still access them in the various FMRC-ish and climpred ways.