Plotting ocean variables in density coordinates (using xhistogram)

Plotting ocean variables in density space (using xhistogram)

Scientific Motivation

Most of the ocean variables in the CMIP6 dataset are in z-coordinates (like the overturning in the top panel of the figure below). But the natural coordinate system for the interior ocean is density (see bottom panel of figure). I propose to use xhistogram ( to bin CMIP6 variables into density coordinates, and to plot the residual overturning circulation for various CMIP6 models, alongside other variables of interest.


The goal of this project is not to do budgets in the water mass transformation framework, but simply to do model intercomparison in density space.

Proposed Hacking

Updated, 09/17/19: I propose that we initially use GSW-Python to find density from temperature and salinity in CMIP6 models, and write a script to pass the required info to xhistogram and return that variable in density space. We could then compare the variables in various models, e.g. heat transport, O2, etc. in density space, and explore any subtleties between the models.

(an older version of this project description focussed more on the equation of state)

Anticipated Data Needs

It is clear that we need (at least) monthly (full depth) thetao, so, vmo, basin. I would also like to look at the meridional heat transport, hfy

msftmrho may be useful for comparison

Having other tracers (O2, carbon, nutrients?) may also be used to contrast the volume transport with tracer concentrations and transports

It is not clear to me whether or not the following variables are available, but they might be useful if they can be found: sea_water_equation_of_state, rhozero, cpocean

Anticipated Software Tools

xhistogram, xarray

Desired Collaborators

Anyone interested in the ROC, the water mass transformation framework, or with knowledge of the Equation of State in one or several of the CMIP6 models. Some experience with xarray would be helpful.

Please help me define what this project will look like and what data variables we need. I have worked with transformation to buoyancy coordinates in the MITgcm, but I have little experience working with CMIP6 data.


I wonder how much this matters. The easiest thing to do is just use GSW-Python for the EOS and treat all of the water as real seawater. Yes, that will introduce some error, but is that error large compared to other errors, such as those related to output frequency, numerical discretization, advection scheme, etc.?

My hunch is no.

1 Like

OK, so perhaps I should reframe the project to focus on comparing plots in buoyancy coordinates, and we can worry about the equation of state later.

1 Like

I would suggest to not only focus on density coordinates, but code up an efficient way to convert any vertical coordinate system into arbitrary vertical coordinates. This would enable the conversion to density coordinates as a special case of a more general function. We should be able to map any field from an existing depth vector onto another one (which might be another tracer). The question about the EOS then becomes a seperate topic.

One design consideration would be if this functionality lives as a separate package or should be integrated into xgcm. Here is a related issue at xgcm. I think there is great interest in this sort of functionality.

So you think it’s important to have interpolation rather than just using the binning feature of xhistogram?

I think we should have options in the function:
The simplest case would be binning using xhistogram (pros: fast, would work on non-monotonic target coordinates). If I read the documentation of xhistogram, this would not need much more work except for maybe a wrapper that bins all fields of a model dataset in the same manner according to a target DataArray. @rabernat, am I seeing this correctly or does xhistogram still need anything for this functionality?

The more involved case consists of two steps (I am sourcing these from the MOM6 code:

  1. Interpolation and creation of a new grid based on target criterion(e.g. density).
  2. Remapping of quantities between old grid and new grid.

The grid is defined as an array of depth values z_old (these are the cell boundaries). We can then define a new grid of depth points (which could be defined by any tracer that is monotonic in depth, e.g. density) z_new.

      z_old_(0)              z_old_(1)                         z_old_(n-1)               z_old_(n)
                z_new_(0)   z_new_(1)                       z_new_(n-1)   z_new_(n)
                      <----------- >

Now comes the fun part. The interpolation defines us new cells (consecutive pairs of z_new values a ‘left’ boundary z_L and a ‘right’ boundary z_R.

There are three principal cases listed in the MOM6 code:

      The target cell is entirely contained within a cell of z_old. 
      This situation is represented by the 
      following schematic :
     z_old_(n+m)           z_old_(n+m+1)       
                z_L      z_R
The target cell spans at least two cells of the source grid.
This situation is represented by the following schematic:
   z-old_(n-1) .        z_old_(n)         z_old_(n+m)        z_old_(n+m+1)
    ----|---------------o---|----------- ... ---------|---o-------------------|-------------
                          z_L                                  z_R

They mention a third case where z_L = z_R, but I dont think that is relevant to offline regridding (correct me if I am wrong, please).

Now we need to remap the values between z_L and z_R and construct a new spacing coordinate dz_new. This can be done in a variety of ways, but I think as a first attempt it is sufficient to assume th values in the ‘subcells’ (the ones split by zL or zR) have the same tracer concentration as the source cell.

The challenge will be to code this efficiently, and perhaps there is a nifty way of using xhistogram and weights to do this? We should look at xesmf to get some inspiration. I remember seeing somewhere that this calculation was achieved using a large matrix of weights?

I don’t think GSW-python supports lazy computation with dask. I think it calls down to C immediately, so you’ll need to wrap.

I don’t think GSW-python supports lazy computation with dask. I think it calls down to C immediately, so you’ll need to wrap.

True, but this is trivial for pointwise operations with apply_ufunc:

sigma0 = xr.apply_ufunc(gsw.sigma0, ds.salt, ds.thetaoga,
                        dask='parallelized', output_dtypes=[ds.salt.dtype])

This question is in regards to the model output available from CMIP6.

What is the temporal resolution of the data available in CMIP6, and is it snapshots or average over sometime?

This is important for the two goals of this project, and needs of data for both:
a ) Project single variables in density (arbitrary) coordinates, such as velocity (for measuring the ROC) or tracer or some other tracer.
b) Eddy correlations estimated in density (arbitrary) coordinates. This is important when addressing transport of tracers on an isopycnal, rather than just volume.

Given snapshot data, (b) is simply an extension of (a). If enough snapshot data is available, eddy correlations can be estimated even if the snapshots are spaced relatively far apart in time.

However, errors creep in if the data has been output as an average over sometime period - example monthly means. This is a well known problem for eddy correlations because all the correlations due to variability at scales smaller than the averaging are missed. It is also a problem for projecting single variables, since part of the variability would have been averaged out on Z coordinates.

Here is a list of possible questions we can address about the ROC. However, addressing any of these hinges directly on the success of the ability to directly coordinate transform.


  • Do the CMIP6 models simulate the correct current day MOC?
  • How does the transport associated with the MOC change under a changing climate?
  • Are essential climate metrics, such as the oceanic heat storage, well correlated with changes in the MOC? Are some of the major differences in the climate variability between different models related to differences in the MOC?
  • Are point/latitude based estimates of heat transport, the analysis that is done some present day models, sufficient to capture climate signals?
  • How long does it take before the climate change signals emerge over the noise due to eddy and short term variability?
  • What does the 3D overturning circulation looks like in these models (more complexity than the MOC, which is a zonal average)? And can we produce clever visualizations of the 3D overturning circulation?
  • Is the MOC (advective transport) sufficient to explain the tracer transport? Or does the along isopycnal diffusive transport also play an important role?

Direct evaluation of the ROC.
It seems like some variables, example ‘msftmrho’ might have meridional overturning circulation that has been estimated online. Does anyone have more details on these? Are these variables time averages? or time+zonal averages? or time+basin averages? Anyone who is interested in evaluating the ROC can try to look at these variables directly, instead of waiting for a solution to density transformation to come about.

Correlations in model
Are variables that store eddy correlations present? Example : , , <C*rho>, <C^2> etc? If correlations are available we could apply some TEM like techniques to also estimate the overturning circulation.

1 Like

Here is a list of tasks to give more detail on the Proposed Hacking section.

  • Coordinate transform: Most of the ocean variables in the CMIP6 dataset are in z-coordinates. So an interpolation onto the density variable would be needed. (Would be nice to have a package that makes this task seamlessly easy, and allows conversion to other tracer coordinates too.)
  • Equation of state: Not all CMIP6 models use the same equation of state. I propose that we collate any python packages for finding density from temperature and salinity in CMIP6 models, and write a small package that takes an input of the variables needed to find density and an arbitrary variable that the user wants to transform into density space. The package would pass the required info to xhistogram and return that variable in density space.
  • Time resolution of output: CMIP6 models do not save model output at very high frequency. An assessment of the role of the missing variability due to the limited temporal resolution available for diagnosis would be important.
  • 3D visualizations of transport pathways.
  • Moonshot : close watermass budgets. (but probably not possible in 3day)

Given the similarity in goals and tools, I’m wondering if it would make sense to combine this project with Water masses of the future (proposed by @raphaeldussin ). Over there, they are talking about plotting different variables in T/S space. Here it’s using density space. Do we really want two separate projects? Or can we collaborate?

1 Like

An update on this is that ‘msftmrho’ is only available for one model. So if we want the ROC, we are going to have to derive it ourselves. Basically nothing useful for this is available at daily resolution, so I think we just have to hope that the monthly data will be useful.

Having one ‘ground truth’ could help with the evaluation of the error due to offline calculation I assume. Given that most output will be saved in some sort of average, I suggest we primarily aim to reduce the error introduced by our vertical remapping function as much as possible and then we can say with some confidence that any other errors are mainly due to the time averaging. Or am I missing something here?

No, that sounds right!

Maybe we can collaborate at the beginning, and then splinter off later? I think @dhruvbalwada @jbusecke and I are probably more interested in the ROC, which doesn’t completely fit into the scope of Water masses of the future. However, getting started with xhistogram together might be really useful, and we may even be swayed to look at the thermohaline streamfunction as well!

Here is a little working example how to use xhistogram to rebin a tracer (I took no3 as example) onto potential temperature data notebook EDIT: Fixed binder link.

For an example of a library that calls out to fortran for an operation along columns, check out

This could be a pattern to follow here. Code up the correct interpolation routine in fortran using a fast, established algorithm. Then just wrap in numpy (via f2py), dask, and finally xarray.

1 Like

That seems like a great way forward, Ryan.

How should we organize around this? I feel we need to chisel out a bit more concrete plan before we hit the hackathon?

For me a key question would be: Do we focus all efforts on the first day to figure out the remapping? Or do some people move ahead with the rebinning approach outlined above and another group works on wrapping a more involved algorithm into xarray?

Also I think it is worth discussing, whether the end-product will end up as a separate package or part of xgcm.