In a small conference room with flickering lights, Andrew Schwartz fires up the overhead projector and points to two word clouds of various colours and sizes. 

Hovering in one cloud, the words in the biggest fonts are "love you," "excited" and "shopping." Those, according to the University of Pennsylvania psychologist, are the words that appear most often on women's Facebook pages.

"I think we were a bit surprised by how much stereotypes did seem to come out," Schwartz says, smiling ruefully. "What came out was a lot of emotional language, also boyfriend-related topics. And then on the male side … we got very different types of words."

These were words most news outlets can't print. But to Schwartz, this is all valuable data. "We study social media, status updates and tweets in order to learn about people," he says.

And what he and others like him are learning is that social media can give a general understanding of the physical and mental well-being of certain communities, and may even be able to hone in on things like depression.

For a couple of years now scientists have been able to flag flu outbreaks by counting how many people are googling the flu.

University of Pennsylvania researcher Johannes Eichstaedt has taken this approach to the next level. He can predict your risk of suffering from heart disease or depression just by analyzing tweets.

What's more, the tweets he's studying aren't necessarily even written by those who are actually depressed or who have had heart attacks.

"What we learned is that the people on Twitter are the canaries of the communities," Eichstaedt says. "They sort of share the psychological states that are shared in the communities. Do they feel angry? Is there road rage?  Do people feel safe outside?


Johannes Eichstaedt is a PhD student at the University of Pennsylvania, where he founded the World Well-Being Project in 2011and was elected an Emerging Leader in Science & Society by the American Association for Advancement of Science. (University of Pennsylvania)

"All these are things that affect the risk for heart disease, and to some extent you can measure [that] in the language patterns of the young generation."

Eichstaedt believes one day public health officials will be able look at these social media maps and come up with interventions designed specifically for each neighbourhood. It's health research on a scale never seen before. 

"Even the biggest studies that the government sets up — the Centres for Disease Control, the Census Bureau — only get tens of thousands, perhaps a hundred thousand people," Eichstaedt says. "We can look at tens of millions of people with these new methods."


In 2012, Facebook published the results of a psychological experiment it conducted on close to 700,000 of its users. It was considered an unprecedented and creepy  breach of privacy.

Facebook users were outraged, prompting the company to apologize and promise not to do it again. 

But now computer scientists, psychologists and statisticians are hard at work shovelling billions of tweets and Facebook status updates into the maws of their computers and mulching up all that big data to find out what it says about a group's physical and mental health.

"By looking at your Facebook profile or your Twitter feed, we can very accurately predict very intimate traits that you may not be aware you're revealing," says Stanford University's Michal Kosinski.

'The results suggest that judgments made by the computer are even more accurate than your own judgments of your own personality' - Michal Kosinski

Kosinski analyzed the social media profiles of eight million people.

And his computer program can predict your political, religious and sexual orientation, even your IQ, all from your Facebook page.

It's personality research, without talking to any actual people.  

"The results suggest that judgments made by the computer are even more accurate than your own judgments of your own personality," Kosinski says.

What's more, they are fast. By analyzing social media instead of using standard surveys, he can test thousands of people in minutes. 

These new methods are still evolving. Whenever possible they are being compared to the results of traditional gold standard studies, because many people are still reluctant to draw important conclusions from something as trivial as tweets.

And since these computational models are so huge and so complex, Kosinski says scientists themselves may have to evolve: from researchers to translators. 

"It is actually very hard for human beings to understand why the hell the computer made this decision," he says. "I think that increasingly the role of the scientist is not to provide insights into the environment, but to interpret the insights and decisions made by the computer."

He acknowledges that using people's social media correspondence without their consent is an ethical grey zone.

But he and his colleagues believe most privacy concerns are outweighed by the extremely practical upside of a lab that costs almost nothing to set up, and is stocked with an almost infinite supply of rats.