Could data mining ever replace the National Household Survey?
It's a question prompted by this week's stalled release of data from the 2011 National Household Survey, over what Statistics Canada admitted were "serious errors." The National Household Survey (NHS) replaced Canada's longer-form mandatory census in 2011.
Those errors, along with rising survey costs ($652 million so far — about 15 per cent more than the 2006 long-form census) and a considerably lower response rate (68 per cent in 2011 versus 93 per cent in 2006) have raised doubts among critics about the credibility of the current census and whether it's even necessary in the modern age of data mining.
- StatsCan delays latest household survey release over data problem
- 10 tidbits we've learned from the National Household Survey
- A geographical look at the National Household Survey
After all, dozens of credit ratings agencies, as well as companies such as Air Miles, Aeroplan and even Facebook, already collect reams of detailed social and financial data on Canadians. Is there still a need for a national survey, or could the government better manage its own data, mining much of the same information for less money?
"We need to move away from the idea that census is the only source of data — to bringing different kinds of data together," said Jan Kestle, president of Environics Analytics. She adds that while the NHS data is still useful and necessary to help businesses compare the data they collect about their customers with the broader population, a more systematic approach is needed to gather data on Canadians.
"When the decision was made to make [the census] shorter, it created the need for businesses and other policymakers to think about what else we could do to make better use of the data … and [consider] what other sources there are."
Kestle suggests the government hold a national review to look at different data options and their potential usage. The purpose of the review, she says, would be to:
- Explore a wider array of data, including the private sector and other government departments.
- Explore the idea of also using administrative data, such as birth registry numbers.
- Bring government and businesses data together in cost-effective ways that aren’t invasive.
- Ensure that the methodologies bring all the different data sources together in a way that maintains data integrity.
Survey is 'universal'
The voluntary 2011 National Household Survey was introduced after the Conservative government cancelled Canada's mandatory long-form census in 2010, citing concerns about personal freedoms.
While opposition parties criticized the move, many experts still see great value in even a voluntary, abbreviated census.
"We need to move away from the idea that census is the only source of data."—Jan Kestle, President Environics Analytics
Gregory Spencer of the Munk School of Global Affairs in Toronto believes that for all the national survey's flaws, private data mining could never fully replace it.
"The surveys Statistics Canada carry out are universal," he says. "Private data mining will come from consumer-type data, so you’re not asking everybody. Some people will get counted more than others, so it’s not going to be consistent."
"You also wouldn’t be able to make direct patterns or see cause-and-effect relations over five years of how Canadian society is changing," he adds. "In a private survey, you couldn't compare people's educational backgrounds with income or employment level — so you couldn't say how much an average person with a degree in geography would make in an average year, for example."
Scott Bennett, a former government policy consultant now with Carleton University, agrees and cites people's existing concerns about privacy with regard to information derived from data mining.
"Data mining is very important, and plays a large and increasing role in market and consumer analysis," he says. "However, it does not provide the same kind of connected information at the micro level as a good survey. We now live in an age when states and corporations know more about citizens than the citizen knows about states and corporations."
'Sense of responsibility'
Even pollsters working in the private sector feel a distinction should be made between the information gathered by the NHS and private data mining sources. Response rates, for one, vary widely between the two.
For Ipsos Reid, Canada's largest market research company, for instance, a survey response rate of 68 per cent is "astounding," when compared to private-sector polls and surveys.
John Wright, senior vice-president of Ipsos Reid, says the private-sector industry's response rate by contrast is 10 per cent or less.
"The current census is not mandatory, but people feel a sense of responsibility when it comes to the census data. It's a remarkable testimony to what it's all about," says Wright, who is also the co-author of What Canadians Think: About Almost Everything
It may be expensive, he adds, but the NHS is necessary as a matter of public record. "There are some who would argue and say there are less expensive ways to do it - but it's the obligation of the government to carry out the census. And I believe taxpayers should pay for it."
Statistics Canada, for its part, told CBC News that the government already uses alternative sources for statistical research.
"If we could use the information available from Revenue Canada on income levels, or the Justice department [i.e. courts or police], then we do to some degree," said Marc Hamel, manager of the census program at Statistics Canada.
Private data miners are also used, he added, citing Canada Post as one of them. "We follow a very meticulous process when working with them and do it very openly."
The final data from the 2011 National Household Survey is scheduled to be released on Sept. 11, and includes details about income levels, earnings, housing and shelter costs across Canada.