Hi,
I am working with CMIP6 daily piControl data. My objective is to identify the 5-day period of maximum precipitation (Rx5day) for each year, and then extract the concurrent daily data for other variables (temperature, omega, surface pressure) on those exact same 5 days.
For context on the data shapes, omega and temperature are pressure level data with dimensions ('time', 'plev', 'lat', 'lon'), while precipitation and surface pressure have dimensions ('time', 'lat', 'lon').
Currently, I am looping through each year, computing the index of the precipitation maximum, and then using a nested loop to extract the other variables via .isel()
Here is my current approach:
for k in range(start_y, end_y + 1):
year_str = f"{k:04d}"
prec_data_k = dataset_pr[‘pr’].sel(time=year_str)
ps_data_k = ps_daily.sel(time=year_str)
ta_data_k = ds_temp[‘ta’].sel(time=year_str)
wap_data_k = ds_omega[‘wap’].sel(time=year_str)
wap_data_k = wap_data_k.where(wap_data_k<0)
prec_rx5d_sum = prec_data_k.rolling(time=5, center=False).sum()
prec_yMax_idx = prec_rx5d_sum.argmax(dim='time').compute()
for i in range(5):
day_idx = prec_yMax_idx - 4 + i
wap_cond_k = wap_data_k.isel(time=day_idx)
ta_cond_k = ta_data_k.isel(time=day_idx)
ps_cond_k = ps_data_k.isel(time=day_idx)
........................................
This code is taking a lot of time to process even a single year.
If I don’t use .compute(), I hit the below error.
ValueError: Vectorized indexing with Dask arrays is not supported. Please pass a numpy array by calling ``.compute``.
My question: Is there a better approach to overcome this? I am looking for an efficient way to extract these exact 5 days across all variables simultaneously without it taking this long to run.
Any advice or pointers would be helpful!
Thanks
Chaithra