Every minute of every day, approximately 347,000 tweets, 293,000 Facebook statuses and 400 hours worth of YouTube videos are uploaded to the web — along with astronomical amounts of information about social media users' lives and habits.

Marketers have been harvesting this data to target ads and develop products for years, but so too have governments, scientists and law enforcement officials in an effort to address other challenges.

From predicting epidemics to stopping cyber-terrorists in their tracks, big data has become an invaluable resource across industries worldwide (regardless of how ethically that data is collected).

Soon, it may even be used to identify and monitor people who show signs of mental illness online.

On Tuesday, Minister of Science Kirsty Duncan announced more than $48 million in federal funding for 76 research teams across Canada through the Natural Sciences and Engineering Council of Canada's (NSERC) Strategic Partnership Grants.

Of that sum, $464,100 has been granted to the University of Ottawa's Diana Inkpen for a three-year-long project called "social web mining and sentiment analysis for mental illness detection."

"Social media is everywhere," reads a news release issued Tuesday by the university. "Internet users are posting, blogging and tweeting about almost everything, including their moods, activities and social interactions."

The release goes on to explain how Inkpen and her team, which includes scientists from the universities of Ottawa, Alberta and Montpellier in France, will explore the use of social media data in screening for individuals at risk of mental health issues. 

"We want to look at what kind of emotions people express, and then we will focus in particular on negative emotions that might show some early signs of possible mental disorders," Inkpen told CBC News on Wednesday. "It could be depression, it could be anorexia, it could be other kinds of early mental illness signs."

Inkpen says her team's goal is to create a set of tools that can be used by doctors, psychologists, school counselors and research groups, among others, to flag concerning patterns in posts made by social media users.

But to do this, the team needs massive amounts of existing social media data to sample.

"When we are collecting data, we are doing it from public sites," said Inkpen. "We are collecting Twitter data, data from medical forums that are public, and Facebook groups for different interests."

Ottawa-based data science technology company Advanced Symbolics has partnered with the scientists to collect such data in both English and French. 

Data by the minute

Data Never Sleeps 3.0, compiled in August of 2015 by data visualization and analytics firm Domo, breaks down the amount of data generated online every minute of every day, on average. (Domo/Data Never Sleeps)

Inkpen's team uses text-mining algorithms to pick up different patterns within the public data sets and to predict what these patterns mean.

"Expressions of very negative emotions that are very strong, or appear a lot over longer periods of time, the algorithms can pick up," she says. "The algorithm learns from the data."

The programs can also watch for how an individual's online activities change over time. For instance, if someone is very active on social media and then changes his or her behaviour long term, it could raise a flag.

A doctor whose patient has agreed to be monitored in this way could receive an automatic alert should concerns like this arise.

'Having a program to help humans understand more is already a good start.' — Diana Inkpen, Professor, University of Ottawa

In the case of cyber-bullying, Inkpen says that her team's tools could also be used to notify parents or school counselors if a child starts posting very angry or strange messages online.

When asked about whether these algorithms could be harmful to users if they fall into the wrong hands, Inkpen cautions that "we must be very careful about that."

Still, she is confident that there will be many positive potential use cases for the tools her team plans to roll out in the summer of 2018.

"We started to look into the #BellLetsTalk campaign because there is a lot of Twitter data," she said. "They could use our tools for some further processing of this large amount of data. It's very hard for a human to read all that."

"Having a program to help humans understand more is already a good start."