Reshape time series into 2 dimension matrix in xarray

Hi Pangeo members,

This is my first time posting question after getting many useful help in the forum. I want to take this chance to quickly thanks all the useful discussions and helpful people on this platform.

So my question is “Is there a more “xarray-way” to reshape a time series into a 2D matrix with dimension1 = year and dimension2=dayofyear”. My ultimate goal is to visualize the time series not in a line plot but a 2D mesh plot that have the year in x-axis and dayofyear in y-axis.

The “non-xarray-way” that works is to inefficiently loop through all time label and find the corresponding year and dayofyear to put in the 2D array below.

``````da_heatmap = xr.DataArray(coords={'dayofyear':range(1,366+1), 'year':range(syear,fyear+1)}, dims=['dayofyear','year'])
``````

I feel like groupby is doing this kind of operation under the hood. However, I am having a hard time to utilize the groupby method since it is usually bundled with a reduction operation.

I know that stack and unstack is also a possible method to achieve my goal. However, to unstack my time series I will need to create a multi-index coordinate that has the corresponding year and dayofyear value. I am currently failed to find a efficient way to do so. Instead, I have to again loop through all time label to find the same year and dayofyear value.

I hope this question is not too-trivial since I cannot seem to find any discussion on this topic so far through google or search in the forum (high chance that this is too-trivial or no one is doing this operation…). Or maybe there is just that I not using the right keyword so far. Thank you!

2 Likes

You can use coarsen.construct assuming you have the same number of days in a year.

I will need to create a multi-index coordinate that has the corresponding year and dayofyear value.

Should be easier with `DataArray.dt.dayofyear` and `DataArray.dt.year`

2 Likes

Ah! wrong searching keyword! Thank you for the coarsen.construct link. I cannot use it for my case since it is observational data with leap year but it is definitely good to know this method.

The `DataArray.dt.dayofyear` and `DataArray.dt.year` is working for my case which help me setup the multi-index to unstack my time series. Thank you @dcherian !

The reproducible working code is below for people who may have the same question

``````ds_ts = xr.Dataset()
da_ts = xr.DataArray(range(1,365+366+1),coords={'time':xr.cftime_range(start='2020-01-01',end='2021-12-31',freq='D')},dims='time')
ds_ts['var1'] = da_ts
ds_ts['dayofyear'] = ds_ts.time.dt.dayofyear
ds_ts['year'] = ds_ts.time.dt.year
ds_ts = ds_ts.set_index(time=['dayofyear','year']).unstack()

``````
3 Likes

I always do this:

`ds.to_dataframe().reset_index().set_index([dim1,dim2]).to_xarray()`

I’m sure there’s a way to do it within xarray that is more efficient, but this is just the thing I remember

If your data is huge and you need to downsample before unstacking, you can groupby the 1st dim, then groupby_bins & reduce on the 2nd dim.

E.g. I’m storing sparse orderbook data, and it would be impossible to fully unstack it without running out of memory. So, I groupby time first, then downsample the price dimension with binning.