Reading xArray datasets in groups

I have created a file with 32 xArray datasets in groups named after the stations they are from. Writing these datasets to netCDF groups is easy with the group parameter in xarray.dataset.to_netcdf. I was surprised not to find a similar group argument in xarray.open_dataset. But… I read the fine print and discovered the **kwargs group=groupName and all is well… Sometimes the obvious things work!

2 Likes

Thanks for sharing your experience Ted! (And welcome to the forum!) I’m glad you were able to get your data read.

While Xarray can read a single netCDF / HDF group, it cannot represent a nested tree of groups with related variables in a single object with its current data model. However, this feature is currently being discussed and is in fact included as part of a pending CZI EOSS proposal.

Ryan,
Glad to be here! I am working with the Incorporated Research Institutes for Seismology (IRIS) and UNAVCO to design a container for many types of geophysical data, mostly timeseries. We are learning about the xArray data model and tools as a candidate data model for that work. The Pangeo community has been very helpful. Thanks!
Ted

Ted,
as a workaround… one trick I use a lot is to read xarray datasets into a dictionary. It seems like this might be a nice way to handle all these station data groups. Something like this:

ds_dict = {}
for name in filelist:
ds = xr.open_dataset(name) #read in a group here
ds_dict[name] = ds # Add data to dictionary

chelle

Chelle,
I like that idea.

I wrote the file from a notebook that reads daily positions of a set of GNSS stations in some region from a UNAVCO web service into a set of dataframes. I created a dictionary that includes a metadata dictionary and a dataframe for each station (below), then I wrote the datasets out to the file with the station metadata dictionary as attributes in each group:

allData: {
    stationID 1: {
        positionMetadata {
            position metadata dictionary
        },
        position data dataframe
    },
    stationID 2: {
        positionMetadata {
            position metadata dictionary
        },
        position data dataframe
    },
    ....
}

I was also thinking of trying to merge them all into one xArray dataset with stationID as a string dimension so I could use xArray to select data from each station rather than reading the appropriate group… My next experiment…

Of course, all of the stations have data for different time periods and I am hoping that the xArray merge creates a single time dimension for all stations…

Thanks again for the idea and the snippet.
Ted

Maybe you could get the group names from h5py, then pass each group to xarray?

Here’s an example.

This behavior is customizable and documented in the align function: xarray.align

In general, it is not a trivial problem to align different timeseries. You may also want to consider interpolation: Interpolating data

Ryan,

Thanks for the pointer to align… I will check it out. In this case the positions are all daily, i.e. low resolution, so I think it should be ok.

Ted

Rich,

I like that general approach…

This file includes a metadata group that is an xArray dataset with a column ID so, in this specific case, I can also get the IDs like:
dataFileName = ‘coloradoStations.nc’
metadata_ds = xr.open_dataset(dataFileName,group=‘metadata’)
metadata_df = metadata_ds.to_dataframe()
metadata_df[‘ID’].unique() =
array([‘SA00’, ‘SG24’, ‘AMC2’, ‘P041’, ‘P037’, ‘NISU’, ‘P040’, ‘P044’,
‘P031’, ‘RG17’, ‘RG22’, ‘RG19’, ‘RG23’, ‘RG16’, ‘RG15’, ‘RG24’,
‘RG20’, ‘RG21’, ‘RG14’, ‘P029’, ‘RG18’, ‘MFP0’, ‘MFTN’, ‘MFTW’,
‘MFTS’, ‘MFTC’, ‘UNAC’, ‘P728’, ‘NIST’, ‘SA62’, ‘RG26’, ‘PRX5’],
dtype=object)
This also provides 30 other fields of information about each station…

I like the ability of xArray to easily include detailed metadata along with the data in these files…

Ted