Google, holocaust deniers, and your fear of math

Mathematician and data scientist Cathy O'Neil argues Google's algorithm is being gamed and it's time for human editors to play a role.
Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. (Shutterstock)

There is a good chance you've Googled something this week.

And you might not have considered typing into that search bar a political act.

And the idea that search bar is a battle ground for knowledge, might feel like hyperbole.

But if you were to Google "did the holocaust happen?" as the Guardian did recently, you will see the top result is from Stormfront — a white supremacist Neo-Nazi forum — that has come up with a top-10 list for why the holocaust did not happen.

Google has said it will not change the results.

But mathematician and data scientist Cathy O'Neil, says Google's algorithm is being gamed, and it's time to bring in human editors to help the search engine's algorithm sift through truth and fiction. 

Cathy O'Neil spoke to The 180's Jim Brown. This interview has been edited for length and clarity.

How do algorithms,  like the one Google uses,  work?

Algorithms take historical data and they look for patterns that are associated with some definition of success. So you and I build algorithms all the time.  They don't have to be formalized in computer code. I use an algorithm on a daily basis when I make dinner for my family. So the data I used for that is the ingredients I have in my kitchen, my knowledge about which kids like what food, the time and ambition I have, and then my definition of success is whether my kids ate any vegetables. If my 8-year old was in charge, his definition of success would be if there was any Nutella, which is to say that we project our agenda onto our algorithm.

So there is always a value system embedded in an algorithm. And once we have a definition of success, we optimize it, and we keep asking how do we get something that will be successful in the future based on what was successful in the past?

Mathematician and data scientist Cathy O'Neil (provided)

So in Google terms, if the search engine gives people a variety of options as results, and people keep consistently choosing one of those results, that result climbs up the list?

Absolutely. We, as the users of Google search, are the most important training force for Google search. They use our daily clicks and search terms to train the model.

So Carole Cadwalladr at the Guardian did some searches and she found that if you start typing "are Jews" in the search bar, it will auto-complete to "are jews evil?" That was her first observation. Her second observation was that if you actually go ahead and go through the results, I think it was nine out of the top ten results were essentially Neo-nazi propaganda about how Jews are evil. The fact that auto-complete happened like that is a direct consequence of a bunch of people consistently asking that very question to Google.

But the other thing Google does is that it sends a bunch of robots to crawl the web and look for content. So as soon as a website is connected to the internet, the Google-bots crawl that content and pick up search terms and they also pick up proxies of quality. Again, what they consider proxies of quality is what they have learned over time and is very much associated to whether people click on that when it comes up in a search.

It's relatively complicated, but essentially what it means is that alt-right propagandists have built content on the Internet and gamed Google's quality proxies.

Even with the most sophisticated algorithm, and I can only assume that Google's algorithm is the most sophisticated, what is it capable of doing and not doing?

Right now, it is capable of following rules that we teach it. It is not capable of differentiating between falsehoods and truth. If you have a dark corner of the Internet that has really bad information, like a bunch of holocaust-deniers, then Google, or for that matter any artificial intelligence that currently exists would really not be able to distinguish between truth and falsehood.

Why is it so hard? Couldn't you just tell the algorithm to ignore the bad information?

Absolutely you could do that. For any given situation, if you had a blacklist of websites, and you told your AI engines 'don't trust these websites' then absolutely they could follow that rule. I'm not suggesting they can't do that.

I'm suggesting that unless you find those websites and tell [the AI engines] not to look there, they will look there. They don't have any deep understanding or a model of the world. They cannot distinguish between truth and lies is because they don't know what truth is. People have this assumption, because I think Google and Facebook told them to think this,  that we don't need humans anymore, that we can replace the gate-keeping of information with an algorithm.

That of course is false. And it's been proven false with this proliferation of fake news, by the proliferation of propaganda and these darks corners of the Internet.

You argue that one of the fatal flaws here is that people, us generally, don't push back on algorithmic decision making because we're afraid of mathematics. What lead you to that conclusion?

Not only are we afraid of math, but mathematics has been weaponized.  In certain situations, there is a kind of way of wielding math as a weapon against people because they're simultaneously afraid of it and they trust it, which means it's almost like God.

When you have that situation, you have a way of keeping people from asking questions - it's supposed to be blind faith, don't ask questions, don't look behind the curtain. We have been living in a state of algorithmic bliss where we think algorithms are perfect, inherently fair, and inherently objective.

So we have to reassess what our options are. Our options are the following.

  1. Have human gatekeepers that do inconsistent jobs and their policies of what gets through the gate and why, might frustrate us, but at least they're accountable.

  2. Have algorithmic gatekeepers where they can't tell the difference between truth and lies and there's no accountability.

Given that, and the awesome power of Google, how optimistic are you that anyone is going to listen to what you have to say?

Because it's so obvious what's at stake now, I feel like accountability - which is what they've been ducking for years and years - it will keep coming up as people continually, successfully game their quality proxies.


To encourage thoughtful and respectful conversations, first and last names will appear with each submission to CBC/Radio-Canada's online communities (except in children and youth-oriented communities). Pseudonyms will no longer be permitted.

By submitting a comment, you accept that CBC has the right to reproduce and publish that comment in whole or in part, in any manner CBC chooses. Please note that CBC does not endorse the opinions expressed in comments. Comments on this story are moderated according to our Submission Guidelines. Comments are welcome while open. We reserve the right to close comments at any time.

Become a CBC Member

Join the conversation  Create account

Already have an account?