Plotting ocean variables in density coordinates (using xhistogram)

I think we should have options in the function:
The simplest case would be binning using xhistogram (pros: fast, would work on non-monotonic target coordinates). If I read the documentation of xhistogram, this would not need much more work except for maybe a wrapper that bins all fields of a model dataset in the same manner according to a target DataArray. @rabernat, am I seeing this correctly or does xhistogram still need anything for this functionality?

The more involved case consists of two steps (I am sourcing these from the MOM6 code:

  1. Interpolation and creation of a new grid based on target criterion(e.g. density).
  2. Remapping of quantities between old grid and new grid.

The grid is defined as an array of depth values z_old (these are the cell boundaries). We can then define a new grid of depth points (which could be defined by any tracer that is monotonic in depth, e.g. density) z_new.

      z_old_(0)              z_old_(1)                         z_old_(n-1)               z_old_(n)
                z_new_(0)   z_new_(1)                       z_new_(n-1)   z_new_(n)
                      <----------- >

Now comes the fun part. The interpolation defines us new cells (consecutive pairs of z_new values a ‘left’ boundary z_L and a ‘right’ boundary z_R.

There are three principal cases listed in the MOM6 code:

      The target cell is entirely contained within a cell of z_old. 
      This situation is represented by the 
      following schematic :
     z_old_(n+m)           z_old_(n+m+1)       
                z_L      z_R
The target cell spans at least two cells of the source grid.
This situation is represented by the following schematic:
   z-old_(n-1) .        z_old_(n)         z_old_(n+m)        z_old_(n+m+1)
    ----|---------------o---|----------- ... ---------|---o-------------------|-------------
                          z_L                                  z_R

They mention a third case where z_L = z_R, but I dont think that is relevant to offline regridding (correct me if I am wrong, please).

Now we need to remap the values between z_L and z_R and construct a new spacing coordinate dz_new. This can be done in a variety of ways, but I think as a first attempt it is sufficient to assume th values in the ‘subcells’ (the ones split by zL or zR) have the same tracer concentration as the source cell.

The challenge will be to code this efficiently, and perhaps there is a nifty way of using xhistogram and weights to do this? We should look at xesmf to get some inspiration. I remember seeing somewhere that this calculation was achieved using a large matrix of weights?

I don’t think GSW-python supports lazy computation with dask. I think it calls down to C immediately, so you’ll need to wrap.

I don’t think GSW-python supports lazy computation with dask. I think it calls down to C immediately, so you’ll need to wrap.

True, but this is trivial for pointwise operations with apply_ufunc:

sigma0 = xr.apply_ufunc(gsw.sigma0, ds.salt, ds.thetaoga,
                        dask='parallelized', output_dtypes=[ds.salt.dtype])

This question is in regards to the model output available from CMIP6.

What is the temporal resolution of the data available in CMIP6, and is it snapshots or average over sometime?

This is important for the two goals of this project, and needs of data for both:
a ) Project single variables in density (arbitrary) coordinates, such as velocity (for measuring the ROC) or tracer or some other tracer.
b) Eddy correlations estimated in density (arbitrary) coordinates. This is important when addressing transport of tracers on an isopycnal, rather than just volume.

Given snapshot data, (b) is simply an extension of (a). If enough snapshot data is available, eddy correlations can be estimated even if the snapshots are spaced relatively far apart in time.

However, errors creep in if the data has been output as an average over sometime period - example monthly means. This is a well known problem for eddy correlations because all the correlations due to variability at scales smaller than the averaging are missed. It is also a problem for projecting single variables, since part of the variability would have been averaged out on Z coordinates.

Here is a list of possible questions we can address about the ROC. However, addressing any of these hinges directly on the success of the ability to directly coordinate transform.


  • Do the CMIP6 models simulate the correct current day MOC?
  • How does the transport associated with the MOC change under a changing climate?
  • Are essential climate metrics, such as the oceanic heat storage, well correlated with changes in the MOC? Are some of the major differences in the climate variability between different models related to differences in the MOC?
  • Are point/latitude based estimates of heat transport, the analysis that is done some present day models, sufficient to capture climate signals?
  • How long does it take before the climate change signals emerge over the noise due to eddy and short term variability?
  • What does the 3D overturning circulation looks like in these models (more complexity than the MOC, which is a zonal average)? And can we produce clever visualizations of the 3D overturning circulation?
  • Is the MOC (advective transport) sufficient to explain the tracer transport? Or does the along isopycnal diffusive transport also play an important role?

Direct evaluation of the ROC.
It seems like some variables, example ‘msftmrho’ might have meridional overturning circulation that has been estimated online. Does anyone have more details on these? Are these variables time averages? or time+zonal averages? or time+basin averages? Anyone who is interested in evaluating the ROC can try to look at these variables directly, instead of waiting for a solution to density transformation to come about.

Correlations in model
Are variables that store eddy correlations present? Example : , , <C*rho>, <C^2> etc? If correlations are available we could apply some TEM like techniques to also estimate the overturning circulation.

1 Like

Here is a list of tasks to give more detail on the Proposed Hacking section.

  • Coordinate transform: Most of the ocean variables in the CMIP6 dataset are in z-coordinates. So an interpolation onto the density variable would be needed. (Would be nice to have a package that makes this task seamlessly easy, and allows conversion to other tracer coordinates too.)
  • Equation of state: Not all CMIP6 models use the same equation of state. I propose that we collate any python packages for finding density from temperature and salinity in CMIP6 models, and write a small package that takes an input of the variables needed to find density and an arbitrary variable that the user wants to transform into density space. The package would pass the required info to xhistogram and return that variable in density space.
  • Time resolution of output: CMIP6 models do not save model output at very high frequency. An assessment of the role of the missing variability due to the limited temporal resolution available for diagnosis would be important.
  • 3D visualizations of transport pathways.
  • Moonshot : close watermass budgets. (but probably not possible in 3day)

Given the similarity in goals and tools, I’m wondering if it would make sense to combine this project with Water masses of the future (proposed by @raphaeldussin ). Over there, they are talking about plotting different variables in T/S space. Here it’s using density space. Do we really want two separate projects? Or can we collaborate?

1 Like

An update on this is that ‘msftmrho’ is only available for one model. So if we want the ROC, we are going to have to derive it ourselves. Basically nothing useful for this is available at daily resolution, so I think we just have to hope that the monthly data will be useful.

Having one ‘ground truth’ could help with the evaluation of the error due to offline calculation I assume. Given that most output will be saved in some sort of average, I suggest we primarily aim to reduce the error introduced by our vertical remapping function as much as possible and then we can say with some confidence that any other errors are mainly due to the time averaging. Or am I missing something here?

No, that sounds right!

Maybe we can collaborate at the beginning, and then splinter off later? I think @dhruvbalwada @jbusecke and I are probably more interested in the ROC, which doesn’t completely fit into the scope of Water masses of the future. However, getting started with xhistogram together might be really useful, and we may even be swayed to look at the thermohaline streamfunction as well!

Here is a little working example how to use xhistogram to rebin a tracer (I took no3 as example) onto potential temperature data notebook EDIT: Fixed binder link.

For an example of a library that calls out to fortran for an operation along columns, check out

This could be a pattern to follow here. Code up the correct interpolation routine in fortran using a fast, established algorithm. Then just wrap in numpy (via f2py), dask, and finally xarray.

1 Like

That seems like a great way forward, Ryan.

How should we organize around this? I feel we need to chisel out a bit more concrete plan before we hit the hackathon?

For me a key question would be: Do we focus all efforts on the first day to figure out the remapping? Or do some people move ahead with the rebinning approach outlined above and another group works on wrapping a more involved algorithm into xarray?

Also I think it is worth discussing, whether the end-product will end up as a separate package or part of xgcm.

Hi Julius, thanks for the questions.

Small teams that communicate with each other when they are stuck is probably the most efficient way to go, so assuming we get the number of people who are signed up for this project, we probably should split up a bit. (also even for I large team, wrapping fortran code to work here may be a multi-day project and not the best use of time for those most interested in exploring the science)

I am thinking we will wrap some adapted portion of layers_fluxcalc.F in the MITgcm code. I will lead/co-lead a group to do this. If we have people who want to move ahead with xhistogram then they can split off and I am happy to spend time with them getting the technicalities working - that group would also be more linked to Water Masses of the Future.

Perhaps overkill, but maybe a good foundation for this here. It seems to me that this more relevant when considering snapshots or short averages?

1 Like

Also regarding the lingo: They define regridding as the step of finding the new depth levels based on another coordinate and remapping as the step to transform the scalar values from the original to target grid in a conservative manner.
I think that helps tightening up some of the discussion.

Though Ryan is right, it probably won’t matter much to use EOS-10 instead of the model specific EOS, it might matter for some cases, especially in investigating deep water masses. I suspect it would be best to use native EOSs where possible. They will also be faster, since they are generally simpler, though this is probably not the primary computational bottleneck…

I would propose that a side project of this overall project could be to create a library of ocean model EOSs that could be used where appropriate. An easy approach would be to take the native FORTRAN or C files and compile them with f2py. A better approach would be to translate into native python that would take numpy (or xarray) arrays. Still probably not that much work.

[edit] I see I wrote too soon, and this was also proposed above by @dhruvbalwada:

Equation of state: Not all CMIP6 models use the same equation of state. I propose that we collate any python packages for finding density from temperature and salinity in CMIP6 models, and write a small package that takes an input of the variables needed to find density and an arbitrary variable that the user wants to transform into density space. The package would pass the required info to xhistogram and return that variable in density space.

I think this is a good idea. I would love it if you or someone else could lead this. Can you clarify where you think this FORTRAN /C code can be found? This was what put me off from doing this. We can talk further in the #vertical-coordinates slack channel.

You can find the EOS in the source files for the various models. E.g., for ROMS, or as a matlab script.

There seem to be quite a few options coded up in MOM6, so this might be one-stop shopping.

There are not that many simplified EOSs so I expect that coding up a few will handle most ocean models.