Minor leak balloons into major web outage at StatsCan: documents
A small water leak created a cascading series of IT problems at Statistics Canada, emails show
An incident involving a leaky air conditioner at Statistics Canada's Ottawa data centre in June mushroomed into a major outage that, among other problems, left some exporters' trucks stuck at the American border.
The rapid escalation of a minor spill into a 30-hour crisis was no accidental series of escalating events, says the former head of the agency. Instead, it was the result of obsolete equipment that's the responsibility of Shared Services Canada (SSC) — the government's troubled IT department.
"The kind of careless error that brought down the data centre on June 9 is hard to excuse," said Wayne Smith, former chief statistician.
"The repeated outages of Statistics Canada's internet site, now its primary method of dissemination, are becoming a national embarrassment."
Through the Access to Information Act, CBC News obtained internal emails and other documents detailing the scramble to get key systems back online over two days in June, Statistics Canada's second major outage this year.
The latest chain of events began Thursday, June 8, when a contractor did routine maintenance on an air conditioner inside the data centre at the agency's Tunney's Pasture complex, about four kilometres west of Parliament Hill.
Faulty work left the unit leaking overnight, and by Friday morning the water caused a small short-circuit that triggered a smoke alarm shortly before 9 a.m. ET. (The monthly Labour Force Survey, a jobs report, had been successfully posted at 8:30 a.m. ET.)
The smoke alarm, in turn, activated a power shutdown to protect the roomful of servers and other IT equipment used to run Statistics Canada's major systems, including its web services and main email.
The contractor was called back to fix the leak, and at 10:15 a.m. ET technicians restored power to the data centre.
Susceptible to damage
Aging IT equipment, however, is susceptible to damage from sudden power fluctuations and abrupt loss of cooling. Flipping the "on" switch blew out several memory units, leaving the data centre completely non-functional again.
There wasn't enough replacement memory on hand for repairs, so new memory units had to be ordered from a supplier in Pennsylvania and trucked into Canada — a 24-hour process.
The units arrived Saturday, and the data centre was finally declared functional again at 4:15 p.m. that day, or more than 30 hours after the minor leak.
During the outage, the agency's heavily used web services were dark, some of its data collection was shut down, and among other services, the main email system was not available.
Even a year on from my resignation, the problems that concerned me then continue to plague Statistics Canada's operations,- Wayne Smith, former chief statistician
Altogether, six key systems were out of commission, including the Canadian Automated Export Declaration (CAED) system, which Canada's exporters use to file key export documents electronically.
The Canada Border Services Agency (CBSA) reported: "Some trucks can't cross the border."
"When the data centre went down this system would have gone down as well, causing problems for exporters using the system and shipping to the U.S., since the export documents filed would not be accessible to CBSA border officers," said Smith, the former chief statistician.
Officials at Statistics Canada and CBSA did not respond when asked to elaborate on the border troubles.
Smith resigned as head of Statistics Canada last September, citing the agency's eroding independence — partly the result of Shared Services Canada's takeover of its IT infrastructure and failure to upgrade equipment.
"SSC has not been properly maintaining and replacing the infrastructure in the Tunney's Pasture data centre," Smith said this week.
"So even a year on from my resignation, the problems that concerned me then continue to plague Statistics Canada's operations. This despite transfers of millions of additional dollars from Statistics Canada to Shared Services Canada."
Earlier this summer, CBC News reported another major disabling of the Statistics Canada website, this one for 26 days, beginning on March 9 — the longest in the agency's history.
A spokeswoman for Shared Services Canada said there have been no outages at StatsCan since the paralyzing air-conditioning incident.
"Shared Services Canada worked closely to ensure business continuity following the emergency data centre shutdown in June 2017," said Monika Mazur.
"Shared Services Canada responded promptly, deploying a team to assess and mitigate the interruption and restore services."
Statistics Canada spokesman Peter Frayne said Canadians needing statistical information during the outage were helped by telephone and another email system that was still operating.
In another blow to its reputation, the statistics agency acknowledged last week that a computer error for the 2016 census had erroneously counted about 61,000 English speakers in Quebec who were actually French speakers.
A $1.35-million report by consultants Gartner Inc., commissioned by the Liberal government to explain the troubles of Shared Services Canada, said earlier this year that the IT agency has been hobbled by lack of money and too much red tape.
Follow @DeanBeeby on Twitter