Hi all,
I am trying to get my head around on how to use the apply_ufunc on each pixel of a dataset, and return the modified dataset.
For my specific case, I have an xarray dataset with the vertical profile of aerosol extinction (time, lat, lon, model_lev). I need to remap it to height coordinates (meters) from model levels coordinates in 500m bins keeping the profile only below 12km heights.
I created a function that do this for an individual pixel, which works fine when fed a single pixel data eg data(lat=0,lon=0,time,model_lev). Instead of looping over lat and lon I would like to apply the function to the whole dataset using apply_ufunc. However when I apply ufunc I get returned a dataset which miss a coordinate and with variables which have no values (AttributeError: ‘DataArray’ object has no attribute ‘values’).
Here is my code:
-------CREATING THE FUNCTION-------
def remap_column_pixel(gc):
"""
get extinction coeff troposhepric profile (<12km) and remap to 500m bins for a single pixel"
gc (pixel): dataset with dimension (time,lev) and variables ZL (heights), TOTEXTCOEF(extinction)
return rebinned extinction coeff vertical profile for the pixel.
"""
# 1) reverse variable values (bottom top)
g=gc.reindex(lev=list(reversed(gc.lev)))
g['ZL']=g['ZL']*0.001 # put height in Km
g['TOTEXTCOEF']=g['TOTEXTCOEF']*1e3 # from m-1 to Km-1 as in observations caliop.
# 2) put height as a dimension. Get average height for each level value across time
g["lev"] = ("lev", g.ZL.mean('time').values)
# 3) cut only at 12 km: find index where 12km
i12 = list(g.lev.values).index(g.sel(lev=12, method='nearest').lev)
gc12=g.isel(lev=slice(0, i12+1))
# 4) rebin 500m height
hbins=np.arange(0,12.5,0.5)
gc12b= gc12.interp(lev=hbins, method="linear")
#gc12b=gc12b.rename_dims({'lev':'levbin'})
#return rebinned profile below 12km heigth
return gc12b
-----APPLYING UFUNC------
###ds is a xarray dataset with dimensions (lat: 50, lon: 140, time: 156, lev: 72)
output_dataset = xr.apply_ufunc(
remap_column_pixel,
ds,
input_core_dims=[["time",'lev']],
output_core_dims=[["time",'lev']],
exclude_dims={'lev'},
vectorize=True,
dask="parallelized",
output_dtypes=["float"],
dask_gufunc_kwargs={'allow_rechunk':True,
"output_sizes": {"time": len(tr.time),'lev':len(hbins)}}
)
has someone faced similar issues before? Thank you!