Long-form census cancellation taking toll on StatsCan data
Questions raised over how data can be used reliably
Prime Minister Stephen Harper's cancellation of the long-form census has started to take a toll on Statistics Canada's data.
The agency released its final tranche of the 2011 census last week, focusing on languages, but it included a big warning that cautions data users about comparing key facts against censuses of the past.
"Data users are advised to exercise caution when evaluating trends related to mother tongue and home language that compare 2011 census data to those of previous censuses," Statistics Canada states bluntly in a box included in its census material.
Those are strong words for a statistical agency, since they raise profound questions about how the data can be used reliably to come to conclusions about language trends. Officials have undertaken a thorough investigation, with a report to be published shortly.
"There are a lot of questions and responses that don't seem to add up," said Doug Norris, the chief demographer for Environics Analytics and formerly a census manager at the agency.
Statistics Canada analysts already know enough about the anomalies to suspect the cancellation of the long-form census in 2010, said Jean-Pierre Corbeil, the lead analyst for the languages part of the 2011 census.
"The only answer we have at the moment ... (is) it's very likely that it is related to the changes in the questionnaire," he said in an interview.
Questions over language data
In the past, the language questions were mostly included in the long form, which went to 20 per cent of households. When Harper cancelled the long form, several groups concerned about tracking the vibrancy of French in Canada went to court to make sure information about official language usage was properly collected.
'There are a lot of questions and responses that don't seem to add up.'—Doug Norris, chief demographer for Environics Analytics
As a result, the Harper government agreed to move the language questions to the short form, which went to 100 per cent of households.
The problem was that the language questions in 2011 were presented in a different context than they were in 2006, explained Corbeil. In 2006, they were preceded by other questions about ethnicity and birthplace. Now, they appear suddenly after basic demographic questions.
The context of the questions has changed dramatically, likely prompting people to answer the questions truthfully, but differently, Corbeil said.
"We reviewed everything. Everything is really OK. The only thing is, we know that the responses we get are really influenced by the context and the placement in our questionnaires."
The main problems arise in how respondents reported their mother tongue and the language they spoke at home. Based on what Statistics Canada knows about immigration, there were far too many people claiming to have two mother tongues -- an official language plus a non-official language -- and speak an official language plus another language at home.
What first set off alarm bells for Corbeil was the proportion of people reporting English as a mother tongue. The raw data from the 2011 census told him it was 58 per cent. That was the same percentage as in 2006, but in the meantime, Canada had received about 1.1 million new immigrants.
And Citizenship and Immigration data, as well as Statistics Canada's own research, told him that 80 per cent of those immigrants did not have English or French as a mother tongue.
If people had responded to the 2011 census in the same way as the 2006 census, the proportion of English-speakers "would have been lower," Corbeil said.
Stats Can left with a 'serious' problem
He looked further and found more strangeness. Between 2001 and 2006, the census found there was an increase of 946,000 in the number of people who claimed a non-official language as a mother tongue. But between 2006 and 2011, that number dropped to 420,000.
That's less than half the increase noted earlier in the decade, even though immigration levels continued to rise at the same rate.
Normally, when Statistics Canada changes its methodology, it test-runs its new questions first, said Michael Wolfson, a former assistant chief statistician at the agency and now at the University of Ottawa.
That way, they can tell exactly how people will respond to the new methods, and take any differences into account when they compare the new data to old information to look for trends, he said.
But in the case of the census, Harper cancelled the long form with so little time before the next census that Statistics Canada did not have a chance to properly test-drive its new approach for the language questions.
"The time just wasn't there to assess this," Wolfson said. "Here we have a last-minute monkeying around with the language question."
As a result, Wolfson said, Statistics Canada is left with a "serious" problem on a sensitive part of the census.
"Clearly there's a problem and the time-series consistency has been lost," he said.
'We need to be cautious'
In an email statement, Corbeil agreed that there was not enough time to thoroughly test-drive the new placement of the language questions. But he says only parts of the data series are broken, not the whole thing.
"We can't say, 'Don't use the data, they are not reliable.' The data are good. They are not fully comparable.... For the moment, that's all we can say. The data can still be used. We need to be cautious."
The problem for users is that they don't know which data is good and which is not, Norris said.
"Is this data a better picture of true linguistic use, or is the old picture better?" he asked. "Right now, we don't know."
It's a worrisome sign of things to come, he added.
"This is exactly the kind of problem we're going to have next year with the national household survey."
That survey is meant to replace the long-form census, asking Canadians about immigration, citizenship, birthplace, ethnicity and work. But the survey is voluntary, unlike the short and long-form censuses.
So Statistics Canada and data analysts alike have warned that the information it produces will not be completely comparable to information collected in censuses in the past.
That survey doesn't start releasing new information until next year -- and the problems have already begun, Norris said.
"I would be very reluctant to make any comparisons with the last censuses."