I’ve recently discovered that “Raster Attribute Tables” can store some very rich metadata.
As an example the GLAD LULC dataset has this rich classification scheme (warning excel sheet) that is hierarchical. Each class is associated with an integer code and a color code.
This was not easy to parse (notebook)!
A second example is in this vignette.
I have a few questions:
- How do people work with such classifications in python today (outside of GDAL bindings)? Is there a data structure that can represent this hierarchy nicely? My first thought was pandas multi-index but that isn’t exactly right.
- What analytic workloads do you use such classifications for? Zonal stats at each classification level (Theme, general class, sub-class) seems likely, but also painful at the moment, without some kind of helper.
- Has anyone explored storing this in Zarr? AFAICT in the GeoTIFF world, this stuff is in a sidecar xml file (are there conventions for this?). IIUC the CF conventions can’t really represent this level of detail for classifications.
@mdsumner I’m sure you have thoughts 
I store them as a table with columns, indexed by pixel value. I see it as levels of grouping, so each unique pixel value has a category in 1 or more columns. (Simple nesting, so it’s equivalent to GROUP BY, I don’t know how that’s thought about in Python).
I don’t know any other way that doesn’t use GDAL , but last time I used one I read the RAT out of the aux.xml directly (because the people I worked with had GeoTIFF with auxiliary RAT metadata and weren’t GDAL users). Otherwise, in R I do
library(terra)
r <- rast("thefile.tif") ## GeoTIFF can't store RAT, but could be in a sidecar .aux.xml
levels(r) ## will give (a list with) the RAT table.
I could share documentation I sent to colleagues recently, but I can’t post that publically without anonymizing it a bit. (There was a disconnect from the real RAT with a table of values in Excel, they had introduced spaces in the class values I think, which is exactly like the very untidy excel defintion Deepak show above, that’s way messier than what I had to deal with and is clearly aimed at human eyeballs).
Happy to hit up RATs with python if that’s helpful, I’ll check if the dataset we looked at is public yet.
Here’s the map, click on any pixel to see two levels of classification
these groupings used to classify the area of habitat (and its change over time)
The practicality was the team had an Excel spreadsheet that was out of sync with the more formal RAT stored in XML as a sidecar (in GDAL form), and the GIS team had to reconcile those with how it was uploaded into the corporate software. It’s an interesting road bump for something that really is only a simple look up table (so imo is complicated by incompatible concepts and software a lot more than necessary unfortunately).