FBI seeks social media data mining tool
The U.S. government is seeking software that can mine social media to predict everything from future terrorist attacks to foreign uprisings, according to requests posted online by federal law enforcement and intelligence agencies.
Hundreds of intelligence analysts already sift overseas Twitter and Facebook posts to track events such as the Arab Spring. But in a formal "request for information’’ from potential contractors, the FBI recently outlined its desire for a digital tool to scan the entire universe of social media — more data than humans could ever crunch.
The Department of Defense and the Office of the Director of National Intelligence also have solicited the private sector for ways to automate the process of identifying emerging threats and upheavals using the billions of posts people around the world share every day.
"Social media has emerged to be the first instance of communication about a crisis, trumping traditional first responders that included police, firefighters, EMT, and journalists,’’ the FBI wrote in its request. "Social media is rivaling 911 services in crisis response and reporting.’’
The proposals already have raised privacy concerns among advocates who worry that such monitoring efforts could have a chilling effect on users. Ginger McCall, director of the open government project at the Washington, D.C.-based Electronic Privacy Information Center, said the FBI has no business monitoring legitimate free speech without a narrow, targeted law enforcement purpose.
"Any time that you have to worry about the federal government following you around peering over your shoulder listening to what you’re saying, it’s going to affect the way you speak and the way that you act,’’ McCall said.
Monitors publicly available info only
The FBI said in a statement to The Associated Press that their proposed system is only meant to monitor publicly available information and would not focus on specific individuals or groups but on words related to criminal activity.
Analyzing public information is nothing new in the world of intelligence. During the Cold War, for example, CIA operatives read Russian newspapers and intercepted television and radio broadcasts in hopes of inferring what Soviet leaders were thinking.
But the rise of social media over the past few years has dramatically changed both the kinds and amount of freely available information. For example, Twitter CEO Dick Costolo said at a recent conference that users of the micro-blogging service send out an average of one billion tweets every three days.
"It really ought to be the golden age of intelligence collection in that you’ve got people falling all over themselves trying to express who they are,’’ said Ross Stapleton-Gray, a former CIA analyst and now a technology consultant who advises companies on security, surveillance and privacy issues.
The system sought by the research arm of the national intelligence director’s office would fuse together everything from web searches to Wikipedia edits to traffic webcams to "beat the news’’ by predicting major events ranging from economic turmoil to disease outbreaks.
The Defense Department’s tool would track social media to identify the spread of information that could affect soldiers in the field and also give the military ways to conduct its own "influence operations’’ on social networks to counteract enemy campaigns.
The intelligence director’s office and the Defense Department said they could not meet the AP’s deadline to answer specific questions about the proposed projects.
The FBI is seeking a web app that would automatically scrape social networks for data that could alert the agency’s operations center to breaking crises as they happen and plot them on interfaces like Google Maps
For such systems to work well, their developers would have to overcome several technological challenges, the easiest of which is handling the massive amount of data involved.
Developments in so-called "cloud computing’’ have made processing big data sets easier than ever before by spreading the work broadly across networks of computers.
Major hurdle: Understanding human language
Instead, experts in the field say the major hurdle is in effect teaching computers how to read. To sift the valuable information from the mundane, the software must understand the subtleties of meaning in tweets and blog posts to tell the difference between, for example, a serious statement and a joke.
Solving such problems falls to researchers in fields such as natural language processing and computational linguistics — the same specialties that brought the world the iPhone’s Siri voice-activated assistant and IBM’s Watson, which trounced its human opponents at Jeopardy.
Authenticity also becomes an issue in analyzing social networks. Computer programs known as "bots’’ already plague services such as Twitter with junk posts similar to email spam. Researcher Tim Hwang has scripted his own bots to see how much influence they could wield over social networks and says the ability to create bots that closely mimic humans will only improve over time.
This matters in intelligence gathering because bots could fool analysts — and their software — into thinking they’re witnessing a genuine shift in social trends that in reality could be a government propaganda campaign driven by, for example, Twitter users that don’t really exist.
"We have all the data. How do we know what’s real and what’s not?’’ Hwang said.
William McCants, an analyst at the Center for Naval Analyses and a former State Department official, monitors al-Qaeda propaganda online. He said he worries that the systems the FBI and other agencies are seeking could create an overreliance on technology at the expense of carefully trained human analysts who are still better at zeroing in on the facts that matter most.
"The more data you use and the more complicated the software, the more likely it is you will confirm a well-known banality,’’ McCants said a friend likes to joke. "You didn’t need to be on Twitter to know that a revolution was happening in Egypt.’’