Optical illusions for computers
If you have ok vision, you can probably recognize a cat when it crosses the sidewalk in front of you. If it's dark or a little further away you might think it's a raccoon or maybe a small dog. Our brains do a pretty good job of figuring out what's going on around us. We might mix up a cat and raccoon now and then, but generally we're not too far off.
Now, some of the artificial intelligences we are creating, particularly the ones based on neural networks, do a pretty good job of learning in a similar way to us. But, in the end, they just think differently than we do.
Anish Athalye is a PhD student at MIT and a member of a student run AI research group called Lab Six. Anish says that these neural networks have a certain vulnerability. "This AI technique that nowadays is being used for all sorts of stuff," Anish says, "from image recognition to text processing to self-driving cars, it turns out that neural networks are susceptible to what are called 'adversarial examples'."
Adversarial examples or images are a kind of attack or hack of an AI system. It's a way of subtly changing an image so a person can't tell the difference, but the computer now think's it's something else entirely. "For example," Anish says, "in the context of image classifiers — things that take in a single image and tell you, oh this is a cat or this is a dog, this is a truck — turns out you can take these images and, kind of, tweak them very slightly in a carefully controlled way such that the neural network is confused and predicts the wrong thing"
"Like, for example, you can take a picture of a cat, change a couple pixels very slightly and all of a sudden the neural network thinks, with 99.99% probability, it's looking at guacamole. "
Anish's example of tricking an AI that a cat is guacamole isn't hypothetical. It's something his lab has actually done.
When we see a furry, four legged object moving across the street in front of us, we might figure it's probably a cat, but there's a chance it's a dog, and a smaller chance it's something else. These AI systems are similar. But while we simply make an assessment and go with it, a neural network will actually decide the probability of each of the possibilities. While there might be an 80% chance it's a cat, there might also be a zero point one percent chance it's a bowl of guacamole. An adversarial attack will exploit that little bit of uncertainty.
Anish says they are examining how the AI assesses an image, and then see whether they can manipulate it in some small way to change the probabilities the AI uses.
"In what direction do I wiggle the pixels in the input to move the classification probability towards the classification that I want it to be classified as?
"I take very small steps and kind of tweak the pixels, making each pixel lighter or darker to slowly move the classification probability toward the desired target."
The theory behind adversarial attacks began with researchers at Google in 2013. But the earlier examples were images that had to be uploaded directly into the computer, rather than images in the real world captured by a camera. Once the researchers turned the images on an angle or rotated them, the program stopped being confused and saw the original image.
"If you have self driving cars that see objects," Anish says, "well an adversary can't control what the actual input is to the classifier, but they can control objects in the real world perhaps. And if those objects are adversarial and robust enough that they survive whatever transformation happens when cars look at it from different angles or rotations or things like that, then that could be a real problem. "
Anish and his Lab Six colleagues want to find out whether adversarial images can trick an AI operating in the real world. "And so what we're working on," Anish says, "is what we call robust adversarial examples. So we want to see if it's possible to produce these things that look like one thing to a human, but look like something completely to a neural net. We wanted to see if we could make 3D objects that could consistently fool a classifier.
"One example we showed was that we could take a turtle. And when you print that out on a 3D colour printer, but now using our algorithm we can slightly alter the texture so that that's consistently classified as something else. And so we randomly chose to make it into a rifle."
If these attacks became common, they could have major effects on current and near future technologies that employee neural networks to operate. If someone was able to alter a stop sign, for example, to appear to a self-driving car to be a speed limit sign, it could lead to major accidents.
Despite the risk, Anish says that the attacks haven't reached that level yet. "There's no evidence yet that these systems have been exploited in the real world," he says, "and there's certain challenges that must be overcome in order to make adversarial examples in the real world."
One of those limits is how the current adversarial examples have been created. "Our example shows how to make these in what is called the 'white box case'," Anish says. " That requires access to the source code or the details of the particular neural network you're attacking."
But that limit might soon be crossed. "One thing we're working on is actually attacking real world systems. So we're trying to figure out how to develop 'black box attacks' on these systems, to come up with robust adversarial examples."