I’m probably jumping into the middle of this conversation… but a few thoughts from my experience.
CDM - My take is that this is more a philosophy… i.e. communities should establish and use a CDM appropriate to their needs. And wherever possible, established standards should be used. In Oceanography, the NODC NetCDF templates are a good place to start. Put your data in this format, and it can easily be integrated into commonly used tools like Panoply and ERDDAP.
There’s really two parts to this… getting your data in the right format (e.g. timeseries, profiles, 4D grids) and including appropriate, required and standardized metadata. E.g. CF Conventions.
As for dimensions and data-types (aka objects in Python), I think I would try to distinguish between the two…
Coming from a Matlab world, things were a bit easier. First there are dimensions: Scaler (single value), Vector (columns of values) and Arrays (2D, 3D… ND). Then there are data types: e.g. integers, floats, strings (and many more annoying ones, like char and datetime). Structured arrays are great too… and very python-dictionary like.
Now in the Python world we have objects galore, and it’s often a challenge to mentally switch between them (lists and dictionaries being a good example).
In my training sessions, I’ve generally glossed over most of this, and basically stuck to trying to explain the differences and advantages between Pandas DataFrames (basically Excel tables - great for lots of individual measurements with 1…n variables/columns) and Xarray DataSets (which are great for multi-dimensional datasets, which of course is our bread-and-butter in geoscience).
In many cases, DataFrames and DataSets are interchangeable. The OOI dataset is a good example, where most instruments provide simple timeseries with multiple variables. So, which type you use really depends on the additional features of the library you wish to use (well, that and performance). But if you had a more complicated dataset, like a profiler dataset that had the dimensions Time and Depth, or optical data with a wavelength dimension, then xarray becomes essential.