'I designed the tool that I wanted.' Engineer who is blind creates app to describe the world around him

App aids visual impaired users with a "talking camera".
The Seeing AI talking camera app helps blind users interpret the world around them. (Microsoft)

Saqib Shaikh is a software engineer at Microsoft. He has also been blind since the age of seven. Shaikh wanted to design a tool that visually impaired people could use to describe the world around them. 

He's the lead on the Seeing AI project, an iPhone app that brings together image recognition, with a "talking camera" and a natural interface, to help users navigate the visual world around them.

Engineer and project lead for Seeing AI, Saqib Shaikh.

Seeing AI takes advantage of the tremendous gains in image recognition in recent years. It's become a lot easier for artificial intelligence to examine an image, and find other images that match. In addition, AI programs can be trained to examine an image and identify what's in the picture. 

When you open the app, there are nine different "channels" for different functions, such as reading short text, or recognizing products, currency, or people. 

So, for example, a user can take a picture of a person (the app indicates location of the subject) and it will describe the person. Or, the user can take three photos of a person and tag them. Subsequently, the app will "recognize" the person.

Saqib Shaikh explained Seeing AI and its future potential to Spark host Nora Young.

Saqib Shaikh can pull out his Seeing AI app whenever he needs it during the day. (Microsoft)

Nora: As you mentioned, you're blind. So can you walk me through how it would fit into your day, the kinds of things that you would use it for, and how it might assist you? 

Saqib: I think of it as this multi-tool, which I could pull out whenever I need. So maybe I pick up the mail in the morning and want to just wave the phone over it to see if it's for me or for my wife. Or maybe later in the day I went to get a drink out of the fridge and they all feel the same, so I can use it to find which is which. Or maybe at work, identify who my colleagues are. So it's throughout the day, all these little tasks that it can assist with. 

Nora: And how does it know, for example, if you're holding it up to look at the mail and see perhaps who it's from, how does it determine if you're holding it in the correct position? 

Saqib: When designing it we had to think about that, and so it actually will read it even if it's upside down. 

The Seeing AI app can be used to read text, such as addresses. (Seeing AI)

Nora: Can you describe a situation where it's been particularly helpful, in your own experience?

Saqib: We love hearing from our users...I just love the little anecdotes. Sometimes they're the things that you totally expect, like someone saying that they've got their confidence back in going shopping. And then, more than the shopping: coming home and putting everything away once it's mixed up in your bags. But then we have these cool stories, like a teacher who has the phone pointing towards the classroom door and he's taught the app to recognize all the students in his school so he can know who's coming in. 

Nora: So what's going on under the hood here? It's bringing together image recognition and what other technologies are coming together for this? 

Saqib: Image recognition is really the key. This idea of deep neural networks is the technical term, where we are able to show the system many, many photos and teach it about what's in those photos. You teach it over time, and then when it sees one it's not seen before, it uses the patterns it's found from the training to give you a description. 

Nora: So in practical terms what does that enable you to do with the technology? I mean, are you able to kind of go into an unfamiliar environment and look at it and use it to look at, so to speak, the objects and the environment around you? 

One day we'd have this system which can describe everything to you as you go around- Saqib Shaikh

Saqib: That's totally the goal. I think where we are today is much more on very task specific things, whether that is arranging your currency because it recognizes currency. Or maybe you want to find somewhere to sit, so we're looking at whether it could help you find a seat. 

But I think these are the building blocks, and I could totally see that one day we'd have this system which can describe everything to you as you go around. 

Nora: I'm trying to think about how to say this without seeming like I think this is no big deal, because I do think it is a big deal! But is the primary insight here bringing together existing technologies, like image recognition, in a way no one thought of before in order to serve a different user? 

Saqib: Certainly a lot of these technologies have existed in the lab for years and we were able to leverage the great work of many scientists. Definitely standing on the shoulders of giants. 

The other half of it is how do you build these interesting natural user experiences? Getting a blind person to operate a camera, it turned out, was harder than we thought because getting someone to lineup a photo requires some interesting sort of guidance. 

I have this idea that disability can actually drive innovation because there's such a great need among, in this case, people who have visual impairments, but people with disabilities more generally. That need can drive new innovation, (and we are always working with the scientists). And then those things will become mainstream in the future. There's so many examples in history of that. 

Advanced technologies like AI can make a more inclusive world- Saqib Shaikh

Nora: And so ultimately what's the goal with this technology? If you cast ahead several years and you think about where it could go, can you paint me a picture of how you see it developing?

Saqib: So if I have free reign to think many years in the future, I imagine this world where advanced technologies like AI can make a more inclusive world. And maybe we each have our own little AI agent which helps us out, that observes the world, knows what we're good at, and what we need help with. And it helps us all to communicate better and work better together. For someone who can't see, it might be information about the visual world. Or someone who can't hear it'd be information about the auditory world. And I just feel that as these intelligent machines can learn our preferences and our needs, when computers and humans work together then the sum is far greater than either alone. 

Seeing AI is available for free for the iPhone.


To encourage thoughtful and respectful conversations, first and last names will appear with each submission to CBC/Radio-Canada's online communities (except in children and youth-oriented communities). Pseudonyms will no longer be permitted.

By submitting a comment, you accept that CBC has the right to reproduce and publish that comment in whole or in part, in any manner CBC chooses. Please note that CBC does not endorse the opinions expressed in comments. Comments on this story are moderated according to our Submission Guidelines. Comments are welcome while open. We reserve the right to close comments at any time.

Become a CBC Member

Join the conversation  Create account

Already have an account?