310 likes | 409 Vues
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register. Yanick Beaucage ICES III June 2007. Overview. Background Automatic Coding Manual Coding Quality Evaluation of Classification Updates Quality Assurance Survey Conclusion. Background.
E N D
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007
Overview • Background • Automatic Coding • Manual Coding • Quality Evaluation of Classification Updates • Quality Assurance Survey • Conclusion
Background • STC’s Business Register Redesign • Improve administrative data link • Improve treatment of births/deaths • Reflect the businesses reality • Give update privileges to a larger set of people • Develop a quality assurance program • Part of the quality assurance program is ensuring good industrial classification
Background • Good industrial classification • Leads to better population identification • Leads to smaller sample size • Leads to reduced collection cost • Leads to better precision • Prevents frustration from respondents (and interviewers)
Background Statistics Canada Business Register
Background Statistics Canada Canada Revenue Agency Business Register
Background Statistics Canada Canada Revenue Agency Business Register Automatic Manual
Background Statistics Canada Canada Revenue Agency Business Register Automatic Updates QE QE Manual
Background Statistics Canada Canada Revenue Agency Business Register Automatic Updates QE QE Manual QAS
Automatic Coding • New businesses apply for a Business Number (BN) (done at Canada Revenue Agency - CRA) • In person, over the phone, over the internet, ... • What is the description of the main Business activity? • Decision tree tool used by CRA • Prompts for details needed for coding • Returns a robot-phrase to Statistics Canada
Automatic Coding • Assign classification based on robot-phrase • Improving decision tree tool and usage • Re-developed on micro (originally mainframe) • Expand use for Web BN application (currently used for phone or in person registration) • Develop questions for all sectors • Currently used for 75% of all industrial sectors • Covers 90% of all descriptions to be coded
Automatic Coding • Automated Character Text Recognition (ACTR) • If description too general Manual coding • Used to assign classification based on descriptions • Reference file (French and English) • Parsing strategy • Word weighting algorithm • Score derived
Automatic Coding • Improving use of ACTR • Improve reference file • Each year new phrases are added • Currently 7 000 phrases • Study score needed for match • Opening the weighting algorithm • Improve parsing rules • Revisit the rules • Create an environment for testing purposes • Evaluate impact of changing input/rules/score
Automatic Coding • 40 000 new businesses a month to code • 45% are coded using robot-phrases • 5% are coded using ACTR • Leaves 20 000 new businesses to code • Need manual coding • Done at Statistics Canada
Manual Coding • Other units to code manually • Survey feedback • New operating entity found when profiling • Tool • Search engine for industrial coding • Improve manual coding • Add on-line ACTR or ACTR results • Add decision tree tool
Manual Coding • New businesses • Goal: code all of them • Reality: do as many as we can • Result: backlog of businesses to code
Manual Coding • New businesses • Goal: code all of them • Reality: do as many as we can • Result: backlog of businesses to code CRA May batch Business Register Automatic Manual Manual Backlog CRA June batch Automatic Manual
Manual Coding • Which units should be coded first? • First in, first out? • Economic activity signal? • Economic activity is determined by administrative data • Both! Select a sample from backlog • Take-all (large economic activity) • Take-some 1 (economic activity / older units) • Take-some 2 (economic activity / newer units) • Take-none (no economic activity )
Manual Coding • Prioritize units to code • Can produce under-coverage estimates of the backlog by industrial sector • Ultimate goal • Improve automatic coding • 80% - 90%? • Code all remaining active units
Quality Evaluation of Classification Updates • Update privileges will be expanded • Subject-matter specialists • Collection personnel • Need to evaluate the quality of updates • Prevent systematic errors • Where to focus training
Quality Evaluation of Classification Updates • Two processes • Notification and sample selection • 1- Notification • Specialist determines set of enterprise to look at • Every update to targeted enterprise is sent to specialist • Agree/Disagree/Do nothing • Make use of expertise of specialist • Specialists keep up-to-date with their frame
Quality Evaluation of Classification Updates • 2- Sample selection and evaluation • Based on industry, source of industry, size and complexity of enterprise • Re-code and compare • Minimize respondent input when re-coding • Using notification and sample • Produce error rate for industrial coding • Target specific problems
Quality Assurance Survey • Goal: assess the quality of classification on the BR on an on-going basis • Assess dead/alive status as well • Point in time surveys done in the past • 1993, 1995, 1997, 2002 • Implement a continuous survey • Produce overall results monthly • Produce detailed results combining 12 months
Quality Assurance Survey • Stratification • Industrial sectors • 2 or 3 size stratum • Have higher sampling fraction for larger size • Recently contacted • Considered to have valid classification • Sample allocation • Target 3.5% standard error for annual industrial classification error rate • 550 units a month
Quality Assurance Survey • Currently doing a pilot test • Monthly estimates produced • Yearly estimates based on weighted average of 12 monthly measures • Weighted average based on 1/12 • Weighted average based on population ratio over the year (Nm/(N1+...+N12))
Quality Assurance Survey • Survey will be used to • Clean-up the register as an independent source • Evaluate industrial in and out-of-scope rate • Evaluate industrial error rate for non-surveyed portion of the register (e.g. small enterprises) • Evaluate death rate in order to adjust sample sizes • Potential use • Evaluate frame quality for new surveys • Clean-up part of the register
Conclusion • Classification is essential to the BR • Redesign provides an opportunity • To improve coding • To standardize tools used for coding • To measure quality of coding adequately • To set-up good practices/good reports • Results • Better quality of business survey frames • More efficient surveys
Yanick Beaucage 613-951-4622 yanick.beaucage@statcan.ca