Sunday June 11, 2017
Yeah, you're really going to love this story.
"Good luck with that." "Great job!" "I'll make a note of that."
Can you tell, just reading those questions, whether they're meant to be sarcastic or not?
Detecting sarcasm can be difficult even for humans, so imagine how hard it would be to teach artificial intelligence software to recognize it.
Mikhail Khodak is a computer science grad student at Princeton University.
Along with his colleagues, Nikunj Saunshi and Kiran Vodrahalli, he created an enormous sarcasm database.
They've amassed about 1.3 million sarcastic statements and hundreds of millions of non-sarcastic comments for comparison.
It comes from a 'data dump' of Reddit data released about two years ago, and available online.
Much of AI today is developed by training programs on huge datasets.
But finding a "sarcastic" dataset is tricky.
Mikhail decided to pull data from Reddit because users have a protocol where they put "/s" after sarcastic statements.
This means the users themselves have verified their own statements for the database. Mikhail recognizes that relying solely on Reddit is not the most reliable -- or diverse -- source, but he says, "we have to start somewhere."
Why is Mikhail so interested in teaching machines to understand sarcasm?
"At some point it will likely be the case that we would like machines to understand -- either when we're interacting with other humans or even when we're interacting with machines -- whether what we're saying we actually mean, or if we're actually being bitter or biting and not expressing our true beliefs."
Let's just hope the machines have a good sense of humour!