Hi all, I was wondering if anyone had advice on passing unique arguments to the xarray xr.open_mfdataset preprocess function. I know you can use functools.partial() to pass arguments to the preprocess function, but I expect those arguments are not specific to each file.
The workflow I seem to run into often with model data is I want to do ensemble analysis, which usually includes creating a new ensemble dimension to take advantage of many of xarrays functions (such as weighted averages). With nice clean CMIP data you could inspect the global NetCDF metadata for CMOR tags like source_id, variant_label, and experiment_id and easily make a preprocess function that does a ds.expand_dims(). However, often data I work with isn’t clean and doesn’t include global attributes useful to making an ensemble dimension.
What I tend to do in these circumstances is simple for loop over each model, add the metadata I need (say from a DataFrame), put these xr.Datasets in a list and combine with xr.merge or xr. combine_by_coords. It works, but isn’t very efficient since they are loaded serially rather than a parallel open with open_mfdataset. It seems like a clever use of preprocess function with arguments specific to each file would be better than a for loop.
Thoughts?