Canada's electronic spy agency collects millions of emails from Canadians and stores them for "days to months" while trying to filter out malware and other attacks on government computer networks, CBC News has learned.

A top-secret document written by Communications Security Establishment (CSE) analysts sheds new light on the scope of the agency's domestic email collection as part of its mandate to protect government computers.

CBC analyzed the document in collaboration with U.S. news site The Intercept, which obtained it from U.S. National Security Agency whistleblower Edward Snowden.

Canada's electronic spy agency watched visits to government websites and collected about 400,000 emails to the government every day, storing some of the data for years, according to the 2010 document. Today's volume is likely much higher given online traffic growth.

Common online activities involving the government include Canadians filing their taxes, writing to members of Parliament and applying for passports.

The program to protect government servers from hackers, criminals and enemy states is raising questions about the breadth of the collection, the length of retention and how the information could be shared with police and spy partners in other countries.

Chris Parsons, an internet security expert who viewed the document, said there are legitimate purposes for the agency to monitor your communications with the government.

"But you should be able to communicate with your government without the fear that what you say … could come back to haunt you in unexpected ways," says Parsons, a postdoctoral fellow at Citizen Lab, a unit at the University of Toronto's Munk School of Global Affairs.

"When we collect huge volumes, it's not just used to track bad guys. It goes into data stores for years or months at a time and then it can be used at any point in the future."

CSE says "specific communications" are examined if they are "suspected to relate to a cyberthreat that could harm government of Canada systems and networks."

Metadata kept 'months to years'

The surveillance service vacuums in about 400,000 emails to and from the government every day and then scans them using a tool called PonyExpress to look for any suspicious links or attachments, according to the top-secret document.

CSE 2010 presentation

On mobile? Click here for the CSE document

That automated system sifts through them and detects about 400 potentially suspect emails each day — about 146,000 a year. That system sends alerts to CSE analysts, who then can take a closer look at the email to see if it poses any threat.

Only about four emails per day — about 1,460 a year — are serious enough to warrant CSE security analysts contacting the government departments potentially affected.

"It's pretty clear that's there's a very wide catchment of information coming into [CSE]," said Micheal Vonn, policy director at the B.C. Civil Liberties Association.

CSE holds on to emails for "days to months," while metadata -- the details about who sent it, when and where -- is kept for "months to years," according to the document. The agency also records metadata about visits to government websites.

Under the Criminal Code, CSE is barred from targeting the content of Canadians' emails and phone calls, but it gets special ministerial exemptions when protecting government IT infrastructure.

The agency refused to provide specifics about the amount of email and metadata collected, and when they are deleted, insisting such information "could assist those who want to conduct malicious cyberactivity against government networks."

IT security analysts at CSE only use and retain information "necessary and relevant to identify, isolate or prevent harm to government of Canada computer networks or systems," the agency told CBC News in a written statement. Data that poses no threat or is not relevant to that goal "cannot be used or retained, and is deleted."

Civil liberties lawyer Vonn argues that there's "much more" Canadians should be told about the agency's collection of their data, such as how long it's held, without putting national security at risk.

"It's distressing that we have to find [details] out in dribs and drabs as opposed to having the appropriate discussion nationally and democratically.

"If we're going to have trust that our agencies are acting responsibly, we need as much light shone on the architecture, the laws and the rules, as possible," said Vonn.

Length of retention an 'utter mystery'

Cybersecurity experts say storing emails and data helps IT security analysts fix vulnerabilities.

parliament hill security

CSE, under its mandate to protect federal government computer networks, vacuums up emails sent to and from the government and monitors website traffic, looking for malware and intrusions. (Canadian Press)

"Sometimes when they discover something they want to go back and check if this was the beginning or first of this particular kind of attack, so the data is actually very useful to them," said Queen's University computing professor David Skillicorn.

Still, documents suggest some of the data can be held as long as decades or even indefinitely.

CSE, under its cyberdefence mandate, is allowed to hold on to personal information — email addresses, IP addresses and other identifiers — for up to 30 years, then transfer it to Library and Archives Canada, according to the agency's own description of its databanks in the federal Info Source publication.

Vonn says it's "an utter mystery" why the government would need to retain personal information of those implicated in a potential cyberthreat for that long.

80 million probes a day

Skillicorn says the documents illustrate the great skill with which CSE is protecting both government websites and email traffic.

"I was impressed by the level of sophistication and cleverness and thoughtfulness," said Skillicorn. "It really does try to do everything that's possible to do in some very, very clever ways."

CSE says it's trying to set up defences because government networks are probed up to 80 million times a day by hackers looking for network vulnerabilities.

Skillicorn says most of those probes involve automated attempts equivalent to harmless mosquito bites, with only a tiny fraction meriting action.

Canada's economic activities, international roles and technology know-how make it an attractive target for cyberattacks by other countries, criminals and hackers.

CSE response to CBC

On mobile? Click here for CSE's response

A single breach in the government's online armour can leave it vulnerable, with the potential that a wealth of sensitive data could end up in the wrong hands.

Soon after the 2010 top-secret presentation, several key federal departments — including the finance department and Treasury Board — suffered major attacks by hackers that forced them offline.

More recently, Canada Revenue Agency shut down in 2014 during income-tax season when a hacker broke into the site via a security bug known as Heartbleed.

Still, Vonn says while government cybersecurity is essential, citizens' level of trust in the post-Snowden era remains low, with many Canadians concerned about CSE activities.

"Accountability is central to understanding how this is keeping us safer and not in fact endangering our cybersecurity, our liberty, our ability to dissent."

CSE document on R&D

CBC is working with U.S. news site The Intercept to shed light on Canada-related files in the cache of documents obtained by U.S. whistleblower Edward Snowden.

The CBC News team — Dave SeglinsAmber Hildebrandt and Michael Pereira —collaborated with The Intercept's Glenn Greenwald and Ryan Gallagher to analyze the documents.

For a complete list of the past stories done by CBC on the Snowden revelations, see our topics page. Contact us via email by clicking on our respective names or search for our PGP keys here.

With files from The Intercept's Ryan Gallagher and Glenn Greenwald