Ira Basen: Big Data's got your number. Should you care?
Retailers, political parties, government spy agencies all collecting personal data
You're strolling through the mall and you get a message on your smartphone.
It tells that you that the shoe store that you are currently passing has a special deal just for you. If you go in and show them this message, they will knock 20 per cent off those shoes you've been wanting to buy. And yes, they have your size in stock.
You're thinking, "Wow, this is cool. Those shoes never go on sale. It's my lucky day."
But as you're heading into the store, a few more thoughts cross your mind. Yes, I've searched for those shoes on Google. And I've checked out a few online shoe stores. I've tweeted about the fact that I'm looking for a pair. I may even have sent an email or two to some friends.
So who has been reading all that stuff? Who knew I wanted those shoes and that I'm a sucker for a 20 per cent off sale? And who knew I was walking past that store?
Is this cool or creepy? Is this Big Data or Big Brother?
Really big data
The term "big data" has been kicking around Silicon Valley since the mid-1990s, but it has really only become part of the common currency over the past year, especially these last weeks with the revelations that secretive government spy agencies, like the NSA and CSE, have been stockpiling the metadata around virtually every phone call made on the planet.
Its spectacular rise has been fueled by three factors.
The first is size. How big is big data? Prepare to have your mind boggled.
From the beginning of recorded time until 2003, all the people in the world created about five exabytes (or 5.4 billion gigabytes) of data, Google chairman Eric Schmidt once estimated.
Today, an exabyte is created every day. That's the equivalent of about 250 million DVDs of information.
According to a recent report by Cisco, global internet traffic will quadruple from 2010 to 2015, when it could reach 966 exabytes per year.
A million exabytes equal a yottabyte, which is the equivalent of 500 quintillion pages of text, or 250 trillion DVDs.
The U.S. National Security Agency, the secret security operation at the heart of the current outrage over government snooping, is currently constructing a $2-billion, one-million-square-foot data centre in Utah that is designed to handle a yottabyte of information gleaned from all the phone calls we make, all the texts, tweets and emails we send, and everything we ever do online.
Even dishwashers and microwaves that have embedded computer chips can now be monitored.
The holy grail of analytics
The second factor in the rise of big data is the dramatic reduction in the cost of storing all that information. In 2001 you would have had to pay about $20 for a gigabyte of digital storage. Today, it will cost you about a nickel.
And the third factor goes by the boring name of analytics. You can have all the zettabytes of data you want, but it won't do you much good if you're not able to separate the information you are interested in from the stuff you don't care about.
It's like finding a needle in a very large haystack.
That's why the data scientists who develop the algorithms and other analytical tools that can find those needles and figure out ways to predict behaviour based on those results have become the new masters of the digital universe.
They have helped turn "business analytics" into a $12 billion industry — and a civil rights headache for democracies.
And the tools they have developed to help retailers sell shoes are the same ones that politicians use to figure out what message will be most effective to win your vote, and governments use to determine whether you are up to no good.
It's not hard to understand why governments, businesses and politicians see data mining as a holy grail.
The more you know about a potential voter or customer, the more likely you are to figure out what needs to be done to trigger a desired response.
Until now, scientists have always worked with incomplete sets of data, which is why they develop models and probability regimes to test their theories.
But what if you had access to a full data set? Could you effectively eliminate the possibility for error?
That's the promise of big data according to its most enthusiastic apostles. In a provocative essay written in 2008, Chris Anderson, then the editor of Wired magazine, declared that big data represented "the end of theory."
"This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear …
"Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves," he wrote. But can numbers really speak for themselves? Can human behaviour really be so predictable?
How certain can that shoe store be that I will be seduced by their offer? How certain can NSA be that someone with a certain pattern of telephone calls will be worthy of more intrusive types of spying?
Mars versus the Mars bar
Not very, according to some data experts, at least as far as consumer behaviour is concerned.
"When an activity is driven by consumer whims, no amount of ingenuity can produce the ability to know what will happen," argued data-mining analyst Gregory Piatetsky-Shapiro in the Harvard Business Review last year. "Predictive analytics can figure out how to land on Mars, but not who will buy a Mars bar."
Another skeptic is Nate Silver, the superstar data cruncher of the New York Times who aggregated polling data to correctly predict the presidential outcome in all 50 states in last year's U.S. election.
In his book The Signal and the Noise, Silver sounded a cautionary note about falling in love with the promise of big data. "If the quantity of information is increasing by 2.5 quintillion bytes per day," he wrote, "the amount of useful information almost certainly isn't.
"There are so many hypotheses to test, so many data sets to mine — but a relatively constant amount of objective truth."
Numbers don't speak for themselves, Silver argues. They are filtered and mediated by the questions we ask of them, which is probably as true for shoe sellers as it is for spymasters.
"Before we demand more of our data, we need to demand more of ourselves," Silver wrote. "Unless we work actively to become aware of the biases we introduce, the returns to additional information may be minimal — or diminishing."
So where does this leave us?
There appears to be no stopping governments, or companies like Facebook and Google, from accumulating massive amounts of data about us.
Some of it may be employed in the service of stopping a terrorist attack, some of it will be employed to sell shoes. The way that data is used will be both cool and creepy.
But they can't know everything about us. Even a big brother can't know that I've resolved to be more financially responsible and cut back on impulsive purchases of shoes I don't really need, even at 20 per cent off.
They can't know that. At least, not yet.