Mining for data

Stephen Baker spoke to CBC News about the Numerati - a powerful force shaping both our world and us.

Interview: Spying on ourselves for the Numerati

Business writer Stephen Baker writes we are witnessing "the mathematical modeling of humanity." We drop personal data as casually as breadcrumbs, sending electronic updates about our lives minute by minute. ((Carolyn Cole) )
Did you check this morning, look up the weather for your city, order a political biography online? Did you call your broker at a stoplight from your BlackBerry, Google Prius reviews at the office, use a loyalty card on the way home for heart-healthy groceries and the Globe and Mail?

You are a data miner's dream, blithely shedding priceless data about your life like breadcrumbs behind you.

The people following the crumbs, the people pulling the information together and making sense or nonsense of it, the computer scientists, the mathematicians, the algorithm-makers, are the subject of Stephen Baker's new book The Numerati.

Whether they are using the data to model behaviour that can predict illness or dementia, deliver the love match we've longed for or spell terrorist, Baker says the Numerati are a powerful force shaping both our world and us.

Stephen Baker spoke to CBC News by telephone from his home in Montclair, New Jersey.  The following are edited excerpts:

Stephen Baker: They have increasing power, because we have delivered the information of our lives right to their doorstep in ones and zeros. And so they can begin to understand and model us and predict our behaviour as shoppers and voters and potential terrorists and workers, and dramatically change our lives.

How can political parties use this kind of data to manipulate opinions?

Traditionally, politicians have had to look at constituencies in large groups. There are different ethnic groups and there are different regional groups, and different generations. And they, in their clumsy way, try to customize their message for each group … You go to the Jewish neighbourhood and you eat a bagel, you go to the Italian neighbourhood and you eat a pizza, you try to adapt and show that you understand these people and their lives.

And you pick up a baby everywhere.

You pick up a baby everywhere. But the fact is... there is diversity in every one of these groups. So if you are targeting a group, you are mistargeting a certain percentage of that group. You are sending them exactly the wrong message.

But if you go to a state that is very close and you can find maybe only 4,000 voters that you didn't expect to be able to find and because of their consumer data you can say these people are likely to be closet Democrats, or closet Republicans, or swing voters who we know are interested in this certain issue, then you can approach them directly, either through the internet with e-mail or direct mail or a phone call or a knock on the door. Then you might be able to turn entire states. 

We're in the middle of a global financial meltdown. What role have the Numerati played?

All of the investments that are made by these banks are done on the basis of risk algorithms so all of the decisions that they make — the portfolio management, the asset management, and all the rest — has to do with algorithms that are designed by the Numerati based on their understanding of risks. So somewhere along the line, they miscalculated risk. It's important to understand that you can have the best math in the world, but if you don't understand human behaviour, then you cannot calculate risk when it comes to market behaviour.

The chapter on terrorism was most chilling for me — the difficulty security agencies have in predicting who's going to do something and when and where, and the mistakes that they can make.

Stephen Baker's new book The Numerati reveals both the dangers and benefits of the data mining which allows mathematicians and computer scientists to understand and manipulate behavior. ((Look/Getty/Thorsten Rother))
The areas in which the Numerati thrive are those in which the cost of making a mistake is minimal because by the very nature of their statistical analysis they make mistakes. That is how they do it. They make mistakes, they learn from the mistakes and they use it to sharpen their analysis. So they are perfect for figuring us out as Cheerios eaters, or as somebody who might be interested in a trip to Cancun. And if they screw it up, if we're not interested in Cancun or Cheerios, it's no big deal. But they've got enormous amounts of data that inform those decisions. They know the history of Cheerios eaters and of all other cereals, and everything else and they can draw the correlations and they can do the same thing with vacations.

But they do not have good data on the day-to-day behaviour of potential terrorists, so they are working blind with that. The real problem is that the urgency of finding these terrorists is so great that they are being called on to come up with something…

So what [chief IBM scientist] Jeff Jonas worries about in that chapter is that they'll just run a ton of data and look for something. They'll just run things through the machine and look for something that pops, some kind of correlation that they're not expecting between somebody's behaviour and somebody else's behaviour. And that is really risky because it is much, much easier to hunt for something when you know something, when you have one lead … If you have a very good idea that somebody is implicated in an act of terror, then you can look at all of their contacts. You are working with really relevant data.

But just to go through your telephone conversations and mine, I mean, we could talk about Al-Jazeera, we could buy hummus, we could talk about bombs, and all of those words could go through the networks and get picked up by data mining programs and little red lights could go off. And it is a real problem for our privacy and our freedom really, because … then they go through and they find other things about us.

So are there things we can do? I am thinking of things like pay only with cash, surf in internet cafés, never buy online, never use a loyalty card?

Society is moving in such a way that it is going to become increasingly inconvenient. Like, try to find a pay phone in Vancouver. There are not too many anymore. Or if you want to throw quarters into the [electronic toll booth] every year there is one less tollbooth for you. So you've got all the losers lining up to throw their quarters in the machine and all the people that are free and easy with their data just zipping past.

I think we talk a lot about privacy and people complain about it and fret about it, but at least in this country and I'm not sure about Canada, when given a choice between privacy and monetary savings, they go for the savings every time. And given a choice between privacy and convenience, they go for convenience. And given a choice between privacy and the promise of security, they go for the promise of security.

But what if we don't? What if we want to escape from the database, the way people live off the grid?

I think there's going to be a market for data control. And so all kinds of companies are going to sell us software that helps us keep control of our data, furnish our data to those who will use it responsibly, and keep it from those who won't. And that's going to be a very important market in this world because it's crucial to how we run our lives.

Is that what you meant when you wrote we can "evolve from data serfs to data masters"?

We are going to be dealing with machines as we try to find people or be found in the world that we are entering. And we have to basically hone our own algorithms so that we come to the top of lists, whether it is a dating site, or a job site, to get first in line for promotions or for a transfer to Paris or something like that. And so, we have to understand the thinking of the Numerati that goes into the algorithm that determines how the machine looks at everybody. And then we have to satisfy the machine instead of satisfying people.

So what you're talking about is doing what businesses do when they tweak their website so it pops up at the top of a search engine, is that right? So we have to do that to our own data?

Right. If your company's website doesn't come to the first page of a Google search, you don't exist for many people. Your company doesn't exist for many people.

And if you don't pop to the top page of someone looking for dates, then you don't exist. And so you have to figure out what they're looking for and then you have to satisfy them so that you get to the top of these lists. You have to optimize your own algorithm.