Reshape time series into 2 dimension matrix in xarray

Hi Pangeo members,

This is my first time posting question after getting many useful help in the forum. I want to take this chance to quickly thanks all the useful discussions and helpful people on this platform.

So my question is “Is there a more “xarray-way” to reshape a time series into a 2D matrix with dimension1 = year and dimension2=dayofyear”. My ultimate goal is to visualize the time series not in a line plot but a 2D mesh plot that have the year in x-axis and dayofyear in y-axis.

The “non-xarray-way” that works is to inefficiently loop through all time label and find the corresponding year and dayofyear to put in the 2D array below.

da_heatmap = xr.DataArray(coords={'dayofyear':range(1,366+1), 'year':range(syear,fyear+1)}, dims=['dayofyear','year'])

I feel like groupby is doing this kind of operation under the hood. However, I am having a hard time to utilize the groupby method since it is usually bundled with a reduction operation.

I know that stack and unstack is also a possible method to achieve my goal. However, to unstack my time series I will need to create a multi-index coordinate that has the corresponding year and dayofyear value. I am currently failed to find a efficient way to do so. Instead, I have to again loop through all time label to find the same year and dayofyear value.

I hope this question is not too-trivial since I cannot seem to find any discussion on this topic so far through google or search in the forum (high chance that this is too-trivial or no one is doing this operation…). Or maybe there is just that I not using the right keyword so far. Thank you!

2 Likes

You can use coarsen.construct assuming you have the same number of days in a year.

I will need to create a multi-index coordinate that has the corresponding year and dayofyear value.

Should be easier with DataArray.dt.dayofyear and DataArray.dt.year

2 Likes

Ah! wrong searching keyword! Thank you for the coarsen.construct link. I cannot use it for my case since it is observational data with leap year but it is definitely good to know this method.

The DataArray.dt.dayofyear and DataArray.dt.year is working for my case which help me setup the multi-index to unstack my time series. Thank you @dcherian !

The reproducible working code is below for people who may have the same question

ds_ts = xr.Dataset()
da_ts = xr.DataArray(range(1,365+366+1),coords={'time':xr.cftime_range(start='2020-01-01',end='2021-12-31',freq='D')},dims='time')
ds_ts['var1'] = da_ts
ds_ts['dayofyear'] = ds_ts.time.dt.dayofyear
ds_ts['year'] = ds_ts.time.dt.year
ds_ts = ds_ts.set_index(time=['dayofyear','year']).unstack()

3 Likes

I always do this:

ds.to_dataframe().reset_index().set_index([dim1,dim2]).to_xarray()

I’m sure there’s a way to do it within xarray that is more efficient, but this is just the thing I remember

If your data is huge and you need to downsample before unstacking, you can groupby the 1st dim, then groupby_bins & reduce on the 2nd dim.

E.g. I’m storing sparse orderbook data, and it would be impossible to fully unstack it without running out of memory. So, I groupby time first, then downsample the price dimension with binning.