*I’ve been talking to Jed Sundwall recently, sometimes about open data. I just wrote him about an idea that we both agreed may benefit from a wider discussion. Here it is, in full.*

Subject: Assess the value of data by measuring the cost of information.

Hey Jed,

Maybe I’ve had too much matcha today, but I feel like writing you about an idea floating around in my head. I’m reading this really great math book (I promise, it’s quite readable) by John C. Baez. It’s called “What is Entropy?”. Reading it has expanded my perspective on how profound information theory is, and how it can be applied for the physical world.

I was once having drinks with a colleague at my old job — Grey Nearing — when posed a question I’ll never forget. He asked, “what if in the history of mathematics, you swapped Newton with Shannon. Information theory was discovered before calculus. Would we have airplanes today?” I paused, taking in the scope of the question. “Yes, I think we would,” Grey answered. Reading this book now, I finally am coming to terms with how this would be possible. “I think information theory is all explanatory, like calculus. We’ve just had 400 years of building on top of calculus, but only 70 years worth of building on information theory,” Grey posited.

I’m only 20% of the way through the book, but I can’t help but think of your mission to find an economic model for data. I’ll find out more as I read more of the book, but here’s a pitch to you for an experiment towards that goal. I’d like to use available data and tools to try to estimate the true value of (Earth) data.

In short, how much does it cost to produce data per unit of information? Cost, in economic terms, I think is tractable to measure: we can get good estimates (if not hard figures) for the construction and maintenance of sensors (satellites), data transmission, storage and distribution. I imagine it’s a bit more difficult to estimate, per a given unit of data, the amount of information in it. This is where the power of entropy comes in.

Entropy, to summarize the book, is a measure of the amount of information in something, expressed as the amount of uncertainty, or amount of stuff we don’t know. If we have a probability distribution for an event (e.g. a coin lands on heads 2/3s of the time and you flip it once), we can calculate the amount of information before the event occurs (sum_p(-p log(p)), or -2/3 log(2/3) - 1/3 log(1/3) = .276 bits of information). Here’s a more concrete example from the book to measure the entropy of the weather report.

Where I’m going with this is: what if we could calculate a probability associated with any (new) piece of Earth data (the chance of occurrence of a given datum)? If we could, we could measure the amount of information in it and then compare it to the cost of its product. I think Geo Foundation Models are the ticket to create this probability value. I have a few ideas of the algorithm involved, but to cut to the chase, I think something like Clay could be used for this purpose.

There are a few aspects of why I like measuring information in this way. For one, given a singular dataset, different subsets would have different amounts of information (who cares about the majority of data around the average; these outliers are where it’s at!). For two, this measures content and not bytes (though costs would scale along with bytes). For three, information is a unitless measure that still allows for comparison across disparate types of datasets. I admit that this fails to measure the value of how the data is used. I don’t mind that much, however, not just because it’s hard to estimate, but because ideally, the value of EO data should grow substantially over time.

I think such an index for measuring the cost of information gained could help in planning within the scope of material production of data. For example, should we launch this Satellite? What is the amount of information we hope to gain with a new sensor per cost?

I’ll stop here and enjoy my train ride. You’ve really given me a great new brain worm. I see this problem everywhere now. Like the fox, I’ll let it be my wheat.

Hope you’re doing great, have a good night/day,

Alex