Upsample and regrid curvilinear grid of geostationary data

Hello
I have Geostationary Lightning Mapper data that I would like to upsample and regrid. The GLM data is on the GOES-16 5424 x 5424 grid. So many of the cells have no data. I have the longitudes and latitudes for each grid cell in the 5424 x 5424 domain.

I would like to upsample this curvilinear grid so I can compare it to a 1.25 degree longitude by 1 degree latitude global climate model grid. I would only compare the section of the model grid that collocates with the GLM data.

I have tried many python applications that don’t provide quite what I need. These include regridder in xesmf, block_reduce in skimage, remap in remap, nctookit, coarsen in xarray. The issues range a requirement that the grids be rectangular, having to know the size of your target grid, or only being able to reduce the grid by an integer number.

I see some regridding tools being developed with pangeo: Regridding - Pangeo Data Are these tools available? Do you know of any tools that would help me reach this goal?

Thanks
Jonathan

This is a snippet of the code below from where I formed the xarray Dataset:

ds = xr.Dataset(
    data_vars=dict(
        glm_ff=(["x", "y"], grid)
    ),
    coords=dict(
        lon=(["x", "y"], lon),
        lat=(["x", "y"], lat)
    ),
    attrs=dict(description="Flashes per km squared per second."),
)

print('print new file: ', ds)
glm_ff = ds['glm_ff']
print('print variable: ', glm_ff)
####----------- XEMF Regrid

ds_out = xe.util.grid_global(1.25, 1)
print(ds_out)
regridder = xe.Regridder(ds, ds_out, "bilinear", periodic=True)
glm_ff_downscale = regridder(glm_ff)
glm_ff_downscale.to_netcdf('glm_ff_downscale_052019_xesmf.nc', mode='w')

Here is a portion of the error log file below after the code runs overnight for 12+ hours:

Dimensions:  (y: 180, x: 288, y_b: 181, x_b: 289)
Coordinates:
    lon      (y, x) float64 -179.4 -178.1 -176.9 -175.6 ... 176.9 178.1 179.4
    lat      (y, x) float64 -89.5 -89.5 -89.5 -89.5 ... 89.5 89.5 89.5 89.5
    lon_b    (y_b, x_b) float64 -180.0 -178.8 -177.5 ... 177.5 178.8 180.0
    lat_b    (y_b, x_b) int64 -90 -90 -90 -90 -90 -90 -90 ... 90 90 90 90 90 90
Dimensions without coordinates: y, x, y_b, x_b
Data variables:
    *empty*
Traceback (most recent call last):
  File "lat_lon_xesmf2.py", line 60, in <module>
    regridder = xe.Regridder(ds, ds_out, "bilinear")
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xesmf/frontend.py", line 772, in __init__
    super().__init__(grid_in, grid_out, method, input_dims=input_dims, **kwargs)
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xesmf/frontend.py", line 315, in __init__
    weights = self._compute_weights()  # Dictionary of weights
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xesmf/frontend.py", line 375, in _compute_weights
    ignore_degenerate=self.ignore_degenerate,
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xesmf/backend.py", line 481, in esmf_regrid_build
    regrid = ESMF.Regrid(sourcefield, destfield, **kwargs)
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/ESMF/util/decorators.py", line 64, in new_func
    return func(*args, **kwargs)
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/ESMF/api/regrid.py", line 185, in __init__
    dstFracField=dst_frac_field
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/ESMF/interface/cbindings.py", line 2068, in ESMP_FieldRegridStore
    handle_esmf_error(rc, 'ESMC_FieldRegridStore')
  File "/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/ESMF/interface/cbindings.py", line 26, in handle_esmf_error
    raise ValueError(msg)
ValueError: ESMC_FieldRegridStore failed with rc = 506. Please check the log files (named "*ESMF_LogFile").

You might try pyresample : Pyresample — pyresample 1.22.3+9.g1928289.dirty documentation

1 Like

@jwsmit12 thanks for sharing and welcome to the forum!

Our community has consolidated behind xesmf as our main gridding tool. It should definitely be able handle this scenario. (Those design documents you found are very old and should probably be taken down.)


Can you post a link to one of the sample files you are trying to regrid? It will be hard to debug your problem without a self-contained example that other people can run.

Also noting this:

Did you have a look at this log file? It seems like ESMF is trying to help you understand why it failed.

Deepak, my understanding was that pyresample is not really appropriate to this sort of strong upsampling scenario because it doesn’t do any aggregation (just interpolation).

Also, big shot in the dark, but you may want to try xe.util.grid_global(1.25, 1.) to avoid having the grid bounds defined as integers.

Hi Ryan:

Thank you for your suggestions.

Here is the first of two responses. Below are two file descriptions with Dropbox links:
I attached the netcdf file with the GOES-16 Full disk longitudes and latitudes on the 5424 x 5424 grid: Dropbox - goes16-full-disk-lat-lon-2.0km.nc - Simplify your life

The above grid is irregularly shaped (curvilinear) as shown here: Dropbox - glm_flash_freq_052019.png - Simplify your life

Another message follows this.
Jonathan

Hi Ryan and all:

The last two descriptions and links are below:

I also attached the model grid file which contains the 1.25 longitude by 1 latitude grid: Dropbox - tracer_level.201901-201912.flash_freq.nc - Simplify your life

The fourth attachment is the actual glm flash frequency on q 5424 x 5424 grid that I want to upsample unto the model grid: (Dropbox - glm_flash_freq_052019.csv - Simplify your life)

Thanks for your help
Let me know if you cannot read the files.
Jonathan

Now there’s a “Bucket resampler”: pyresample.bucket package — pyresample 1.27.1+114.g294e6ea.dirty documentation that looks like a coarsen. It doesn’t seem like it does the proper weighting like xesmf though.

cc @djhoese

As a pyresample dev, pyresample is changing a lot and the documentation has not kept up with the development lately. There are probably some features that would help here that aren’t well documented. The Satpy library has a reader for gridded GLM L2 data and can resample it to any PROJ supported projection using various resampling algorithms while using dask underneath. The other nice thing about using this reader is that it understands the geolocation of the data in a way that works well with pyresample. Whether or not it performs how you expect or offers all the algorithms you want is a different matter.

You mentioned you want to upsample the GLM data to a finer resolution. Do you need it to be a bilinear interpolation for the new points? What about nearest neighbor? Pyresample has both, but the current bilinear interpolation may not perform very well. @mraspaud and @pnuu have been working on a gradient_search algorithm that does a bilinear interpolation out of the box, but I’ve lost track of what the current status of that is.

For upsampling, isn’t it all interpolation (not aggregation) between source data points?

from satpy import Scene
scn = Scene(reader='glm_l2', filenames=[glm_l2_filename.nc])
scn.load(["flash_extent_density"])
new_scn = scn.resample(new_area_def, resampler="gradient_search")
new_scn["flash_extent_density"]  # <- resampled DataArray with dask array underneath

Hopefully Martin and Panu can comment on any other suggestions.

3 Likes

Thanks for your reply @djhoese.
My question about what you have above is who do I define the new_area_def?
The new area would be based on 1.25 lon by 1 lat global climate model grid.
So the GLM FD is a subset of that grid.

I used the GLM flash extent density to calculate lightning flash frequency for a one month period.
Its on the 5424 x 5424 curvilinear grid.
I have the lightning flash freq in a .csv file.
I also have a netcdf file with the longitude and latitudes for the above grid.

I would like to upsample to a coarse 1.25 lon by 1 lat global climate model grid.
I think bilinear is the way to go.
With this I would be able to track the lons/lats of the upscaled grid.
Please let me know anymore of your thoughts?

Jonathan

1 Like

Hey @jwsmit12… it might be overkill for what you are doing, but I have developed a small xarray- and xesmf-based regridding tool for regridding hydrometeorological forcing data onto WRF’s curvilinear grid. Essentially, this just “wraps” xesmf with some tools to specify some characteristics of the source/destination grids. This is done via yaml files and in an example grid. xesmf works with both up- and downscaling… xESMF: Universal Regridder for Geospatial Data — xESMF 0.3.0 documentation

This might have some tools to help. Also note that I had to push my source arrays to numpy when using xesmf to compute the regridding weights. But then when actually regridding the data, the code just does the regridding using the xarray source dataset. I’ve tried it for a few source and destination datasets now (curvilinear and rectilinear) and it’s worked well. Sorry, the documentation is pretty minimal too… for now.

1 Like

@jwsmit12 For defining the input and output area definitions you could take a look at this documentation. When oceanographers or others in the Pangeo community say “curvilinear” do you just mean a grid that is not square/uniform in lon/lat space? Or do you mean pixel coordinates where each pixel has 4 bounding coordinates (something I’ve seen talked about in Ryan’s presentations)?

Pyresample only deals with coordinates that represent the center of a pixel. We have AreaDefinitions for uniform grids in any projection that PROJ understands. We have SwathDefinitions for arbitrary spaced lon/lat coordinates that you can provide as two 2D arrays, but they don’t offer quite as many features as the AreaDefinitions. There may be utilities beyond what is mentioned in the above documentation to do what you’re doing easier, but I don’t often create AreaDefinitions from other AreaDefinitions so I don’t know the best way off the top of my head.

Also note that “glm_l2” example above assumed you were using the “standard” gridded GLM L2 products. If they are custom files then they probably won’t work out of the box. There are definitely still ways to process your data with pyresample, it just isn’t as easy.

Hello @djhoese @rabernat

Today, I returned to the xarray.coarsen and it seems to be very close to what I need.

When I execute this function for example:
glm_downscale = ds.coarsen(lon=34, lat=34, boundary=‘pad’).mean()

I get a grid that is very close to what I need.

However, to be exact I would need to set the lon and lat to 33.18. When I do that I get the error below that basically says lon and lat have to be integers. Is there any way to override or ignore this error to get the function coarsen function to calculate with the float datatype? (i.e. try except [did not work for me but maybe I have commands out of order])

Error message here:
glm_downscale = ds.coarsen(lon=33.18, lat=33.18, boundary=‘pad’).mean()
File “/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xarray/core/rolling.py”, line 1036, in wrapped_func
**kwargs,
File “/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xarray/core/variable.py”, line 2194, in coarsen
reshaped, axes = self.coarsen_reshape(windows, boundary, side)
File “/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xarray/core/variable.py”, line 2248, in coarsen_reshape
variable = variable.pad(pad_width, mode=“constant”)
File “/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xarray/core/variable.py”, line 1350, in pad
**pad_option_kwargs,
File “/app/spack/v0.15/linux-rhel7-x86_64/gcc-4.8.5/python/3.7.7-d6cyi6ophaei6arnmzya2kn6yumye2yl/lib/python3.7/site-packages/xarray/core/duck_array_ops.py”, line 56, in f
return wrapped(*args, **kwargs)
File “<array_function internals>”, line 6, in pad
File “/home/Jonathan.Smith/.local/lib/python3.7/site-packages/numpy/lib/arraypad.py”, line 740, in pad
raise TypeError(’pad_width must be of integral type.’)
TypeError: pad_width must be of integral type.

Xarray.coarsen works by aggregating discrete points into larger cells. It does not know how to subdivide cells. So it requires integer arguments.

It’s too bad that xesmf is not working here. This is exactly what it is designed for.

@djhoese @rabernat

Thank you guys for your help.

xESMF runs now but extremely slow. Going from 5424 x 5424 to a 162 x 128 grid with conservative regridding is taking 18+ hours and still the run is not finished. This is common according to the developers.

I put in a ticket on the xESMF Issues side and it is recommended that I use the xESMF ESMF_Regrid command with mpirun for a faster result. Working on that now.
Jonathan