Applying ufunc to each pixel of a dataset

Hi all,

I am trying to get my head around on how to use the apply_ufunc on each pixel of a dataset, and return the modified dataset.
For my specific case, I have an xarray dataset with the vertical profile of aerosol extinction (time, lat, lon, model_lev). I need to remap it to height coordinates (meters) from model levels coordinates in 500m bins keeping the profile only below 12km heights.

I created a function that do this for an individual pixel, which works fine when fed a single pixel data eg data(lat=0,lon=0,time,model_lev). Instead of looping over lat and lon I would like to apply the function to the whole dataset using apply_ufunc. However when I apply ufunc I get returned a dataset which miss a coordinate and with variables which have no values (AttributeError: ‘DataArray’ object has no attribute ‘values’).

Here is my code:

-------CREATING THE FUNCTION-------

def remap_column_pixel(gc):
"""
get extinction coeff troposhepric profile (<12km) and remap to 500m bins for a single pixel"
gc (pixel): dataset with dimension (time,lev) and variables ZL (heights), TOTEXTCOEF(extinction)
return rebinned extinction coeff vertical profile for the pixel.
"""

# 1) reverse variable values (bottom top) 
g=gc.reindex(lev=list(reversed(gc.lev)))
g['ZL']=g['ZL']*0.001 # put height in Km
g['TOTEXTCOEF']=g['TOTEXTCOEF']*1e3  # from m-1 to Km-1 as in observations caliop.
# 2) put height as a dimension. Get average height for each level value across time
g["lev"] = ("lev", g.ZL.mean('time').values)

# 3) cut only at 12 km:  find index where 12km
i12 = list(g.lev.values).index(g.sel(lev=12, method='nearest').lev)
gc12=g.isel(lev=slice(0, i12+1))

# 4) rebin 500m height
hbins=np.arange(0,12.5,0.5)
gc12b= gc12.interp(lev=hbins, method="linear") 
#gc12b=gc12b.rename_dims({'lev':'levbin'})

#return rebinned profile below 12km heigth
return gc12b     
                                                     
-----APPLYING UFUNC------

###ds is a xarray dataset  with dimensions  (lat: 50, lon: 140, time: 156, lev: 72)

output_dataset = xr.apply_ufunc(
remap_column_pixel,
ds,
input_core_dims=[["time",'lev']],
output_core_dims=[["time",'lev']],
exclude_dims={'lev'},
vectorize=True,
dask="parallelized",
output_dtypes=["float"],
dask_gufunc_kwargs={'allow_rechunk':True,
                     "output_sizes": {"time": len(tr.time),'lev':len(hbins)}}

)

has someone faced similar issues before? Thank you!

The code has lost its indentation so I can’t quite tell what exactly is going on, but looking at the docstring and the way you’re calling apply_ufunc it seems like your function expects a Dataset object.

However, apply_ufunc serves a different purpose: it helps with applying functions that work on the underlying arrays, for example functions like numpy.mean or anything that scipy provides.

In this case, I think that’s the wrong tool. Instead, I wonder if it would be possible to write a function that is a composition of xarray operations and works on all pixels at the same time. For example, if I’m not misunderstanding anything, gc.reindex(lev=list(reversed(gc.lev))) could also be written as gc.isel(lev=slice(None, None, -1)).

The only thing I’d expect to be a little bit more complicated would be step 3, but that might be because I can’t really imagine what data it is working on looks like. Could you please post a string representation of the dataset at that point? Bonus points if you could write some example code that allows me to reproduce your results, either by opening a small sample of real data or by mocking up dummy data, because that really speeds up the feedback loop.

Thanks keevis for your reply! The dataset I feed in has this structure:

Thanks for that, I think I understand the issue better now. ZL appears to be a variable varying in time and space, which you’d like to linearly interpolate to common values.

Since this looks like a meteorological dataset (I know very little about climate science / meteorology), I’d probably reach for MetPy instead of writing my own function (though whether that function exists is another matter).

For example, this: Sigma to Pressure Interpolation — MetPy 1.6 looks like it could fit (I didn’t try, though).

cc @dopplershift, in case you have any advice here

1 Like

You might try Transforming Vertical Coordinates — xgcm v0.8 documentation

sigma_to_pressure will only get you to pressure coordinates. To get to height, unless the model output has it, you’ll have to make some additional assumption (e.g. standard atmosphere) or maybe try integrating pressure and temperature together using the hypsometric equation? The linked resources from xgcm don’t seem to address converting sigma to height.

Thank you all for the suggestions! At the end using vertical transform vertical coordinates from xgcm worked fro my problem! It is also based on defining a 1d interpolation and then use apply_ufunc.