Here's why your phone can't understand your accent
This story first aired in September 2015.
Have you ever been frustrated by a machine that just doesn't "get" your voice? If so, you're not alone:
Your accent might be the the reason your smartphone can't understand you, according to Marsal Gavaldà, a speech recognition expert and director of machine intelligence at Yik Yak.
"Accuracy varies widely from speaker to speaker," he explains. "There are certain segments – children, seniors, or people with an accent – whose recognition rate is very low, or much lower than the rest of the population."
Gavaldà calls this "the speech divide," and at a recent speech technologies conference, he called for researchers and companies to acknowledge the gap between speech technology haves and have-nots.
"Of course, it's not something that is typically discussed publicly," he says. "My effort here is to make this more of an issue, to encourage these companies to make it more of a priority."
Gavaldà says part of the solution involves more diversity in the data used to train machines.
"It's not the case that one accent is harder than another," he explains. But because voice recognition systems are often trained using broadcast recordings and transcripts, they can have a bias towards certain accents.
"At the end of the day, it really depends what the software has been trained on," says Anne Wootton, co-founder and CEO of Pop Up Archive, a speech-to-text company that turns recorded audio into searchable transcripts.
Wootton says Pop Up Achive's software is trained to understand media and oral history recordings, so "the transcripts we generate for American English and British English, even Indian accents, are far more accurate for the most part than an accent from a South American country in some cases. This is anecdotal, and ad-hoc, but we come up against it all the time."
Beyond accents, Wootton says background noise and recording quality can also impact speech recognition accuracy. "Audio recorded on a tractor in the middle of a field on a farm with tractor noise and a couple of people having a conversation is going to be very different from a recording done in a recording studio or in a quiet room, with a single person speaking pretty directly into a microphone."
In recent years, consumer technologies like Siri, Google Now, Microsoft Cortana, and Amazon Echo have pushed speech recognition into the mainstream. For Gavaldà, the speech divide is an equality and accessibility issue.
"If there are certain parts of the population that are not well understood, they will have less access. They will be less likely to use and benefit from these systems."
Example: Ugandan Kayakers
To illustrate how accents can impact the accuracy of machine-generated transcripts, Pop Up Archive transcribed a recent story from CBC's All In A Day, featuring the Ugandan freestyle kayak team.
Click play below to listen and follow along with the machine-generated transcript.
Can you tell how the accuracy changes from speaker to speaker?
To encourage thoughtful and respectful conversations, first and last names will appear with each submission to CBC/Radio-Canada's online communities (except in children and youth-oriented communities). Pseudonyms will no longer be permitted.
By submitting a comment, you accept that CBC has the right to reproduce and publish that comment in whole or in part, in any manner CBC chooses. Please note that CBC does not endorse the opinions expressed in comments. Comments on this story are moderated according to our Submission Guidelines. Comments are welcome while open. We reserve the right to close comments at any time.
Become a CBC Account Holder
Join the conversation Create account
Already have an account?