Sunday June 11, 2017

Yeah, you're really going to love this story.

It's hard enough for humans to detect sarcasm, try getting AI to do it!

It's hard enough for humans to detect sarcasm, try getting AI to do it!

Listen 7:29

"Good luck with that." "Great job!" "I'll make a note of that."

Can you tell, just reading those questions, whether they're meant to be sarcastic or not?

Detecting sarcasm can be difficult even for humans, so imagine how hard it would be to teach artificial intelligence software to recognize it.

Mikhail Khodak is a computer science grad student at Princeton University.

359 Misha Khodak

Mikhail (Misha) Khodak

Along with his colleagues, Nikunj Saunshi and Kiran Vodrahalli, he created an enormous sarcasm database.

They've amassed about 1.3 million sarcastic statements and hundreds of millions of non-sarcastic comments for comparison.

It comes from a 'data dump' of Reddit data released about two years ago, and available online.

Much of AI today is developed by training programs on huge datasets.

But finding a "sarcastic" dataset is tricky.

Mikhail decided to pull data from Reddit because users have a protocol where they put "/s" after sarcastic statements.

This means the users themselves have verified their own statements for the database. Mikhail recognizes that relying solely on Reddit is not the most reliable -- or diverse -- source, but he says, "we have to start somewhere."

Why is Mikhail so interested in teaching machines to understand sarcasm?

"At some point it will likely be the case that we would like machines to understand -- either when we're interacting with other humans or even when we're interacting with machines -- whether what we're saying we actually mean, or if we're actually being bitter or biting and not expressing our true beliefs."

Let's just hope the machines have a good sense of humour!