Nothing compares to Data

Tim O'Reilly's "Data is the New Sand" only illuminates part of our way.

Mar 31, 2021

How should we understand data? In a quality, The Information essay, technologist and publisher Tim O'Reilly was the latest to grapple with the question. The metaphors we use for data is wrong he says. And that’s problematic because metaphors are like maps. They can provide answers or lead us astray.

Data is the new oil

O'Reilly argues that the reason we don't have a good handle on what data is, is illustrated by the misinterpretation of British mathematician Clive Humby's now popular 2006 metaphor that "Data is the new oil". We have come to see the quest for data as a "There Will be Blood"-kind of gold rush. We all should claim our share before an information age equivalent of Big Oil monopolises everything.

Taking Facebook profits as a proxy for data generated wealth, O'Reilly illustrates that data is actually not that valuable when the money it generates is redistributed among Facebook’s users. Worldwide he estimates a profit of $7.50 per Facebook user per year. This rises to $11.48 for European users and $36 for North American ones. That’s $3 a month, far less than the meagre cost of a Spotify subscription per month.

“Data is the new sand”

But says O'Reilly, Humly was trying to make a more nuanced point that's gotten lost. Namely, that data, like oil, is only valuable after it's been transformed by industrial refining. So now, to get rid of the association with riches, O'Reilly offers us a new metaphor instead of oil: “Data is the new sand”.

Silicon is sand - and the second most common substance on planet earth. It is of little use until enormous techno-industrial processes transform it into something useful that sits at our modern economies' heart—silicon chips. Without our advanced knowledge and cutting-edge manufacturing techniques, made possible by gigantic amounts of capital, it's just well, sand. The economic value gets added by layers of hardware, operating systems, technologies, and applications that rely on these chips. Yet nobody asks, is sand valuable?

So is it with data he argues:

"Take Google: It must crawl the entire web—collecting and indexing trillions of webpages, tweets and other data, much of it in real time—and then search the results using complex algorithms and artificial intelligence to answer user questions 3.8 million times per minute. It must route its Google Maps customers to hundreds of millions, if not billions, of destinations, in real time, every day. Each day, it must store and process 4 billion photos and videos on Google Photos and stream 5 billion videos on YouTube. Facebook routes billions of posts to billions of users in personalized news feeds. These services require massive investments in data mining, refining, operations and management, as well as a constant investment in R&D."

Yet, even if he makes a good case, O'Reilly knows this is not the whole story. Unsatisfied, he immediately goes reaching for another metaphor.

“Data is the new Oxycontin”

There's are things that neither the metaphor of data as oil or sand captures. And for O'Reilly, it is this: it has a quality where it can both harm or greatly benefit tech companies' users. He, therefore, likens it to the powerful opioid Oxycontin (also because as an aside, he says data is addictive). If Google knows what you want, then smart targetted advertising can help you. At the same time, pushing disinformation, no matter how engaging, in an election is wrong. What users want is not always good for them. Similarly, the problem with Big Tech companies hoovering up medical information is not one about privacy. This medical data can be to a user's benefit. Users know this and give it gladly. It can help with treatments and cures. Unless of course it is used to discriminate (say for insurance purposes). So fundamentally, the question O'Reilly poses as the primary one is this: what is the benefits and risks of having control over so much data?

I would argue his metaphor, Oxycontin, as something powerful, that could be used for both good or bad, also does not provide a comprehensive account of the nature of data. Just like data as sand only elides part of the picture: the industrial processes needed to make data useful, AND that it is not of much value to individuals.

Without wishing to be facetious, we are still missing several other significant peculiarities of data in these metaphors: That, not all data is of equal value or even potential value; That aggregate data is more valuable than the sum of its parts is another - it can be an economy of scale and scope; Linked to this is something that’s now becoming apparent. Massive amounts of data is what deep learning algorithms need to perform better; Data is also a non-rival and intangible good, yet unlike many non-rival goods, it is excludable; On top of that data is personal, lots of it is the byproduct of how ordinary people live their information-age lives, and this data can make them transparent and legible to the controller of this data; and, as I have written before, data often is social and spacial, the result of an interaction between people, or represent relationship(s), place(s).

But the thrust of O'Reilly’s concern about data as a question about power is broadly in the right direction. It is, in fact not unlike the question legal scholars have started to ask when grappling with data: Is a novel formulation about power and who gets to exercise it needed? This is an area of inquiry on which legal theorists have just embarked, and so, unsurprisingly they also lack a metaphor for data. But it is quite possible, that data is so unique, so abstract and malleable, yet its reach and impact so large that no single metaphor will do it justice.

Leviathan

Discussion about this post