320 likes | 356 Vues
Discover the intricate process of conducting census in Korea including historical background, unique features, challenges, and innovative methods like Internet surveys. Explore how the Korean National Statistical Office ensures accurate data collection and processing. Be prepared to learn about the future direction for a smarter 2010 Census.
E N D
How Korea takes a Census and The direction for the 2010 Census 2008. 9. Korea National Statistical Office
1 2 3 4 5 Outline of the Korean Census Environment of Census-taking Internet Survey e-Census System Data Capture and Editing Contents
1 Outline of the Korean Census • Historical Background • ◈ The Population Census has been conducted on every five years • since 1925 and the Housing Census since 1960 • - 2010 Population Census : 18th Census • - 2010 Housing Census : 10th Census • Legal Basis • ◈ Statistics Law and its Enforcement Decree • - Designated statistics : Population Census No. 10101 • Housing Census No. 10102 • ◈ Population and Housing Census regulations
Census Day: As of 0:00, November. 1 • Census-taking Period • ◈ Preparatory work : October 29-31 • ◈ Enumeration : November 1-15 • Coverage • - All Koreans and foreigners, • - and their housing units • - within the scope of the administrative jurisdiction of the Korea • - as of the Census day
Census questions • ( ) : No. of questions which were made by each Province • Enumeration Methods◈Face to face interviews ◈Self-enumeration • ◈ Internet Survey (introduced on 2005 census)
Budget ◈The cost of Census is over the total budget of KNSO • <billion won> • Release of the Results (based on 2005) ◈Preliminary results : December 2005 ◈ Final results : May 2006 ~ December 2006
National Statistical Office Technical Advisory Committee Census Data Users’ and Experts’ Committee Metropolitan city, Province (16) City, County (250) Eup, Myeon, Dong (3,573) Supervisor (8,000) Enumerator (90,000) 4 • The System of the Census-Taking(based on 2005)
2 Environment of Census-taking • Social Environment ◈ Growing awareness of privacy ◈Increasing unwillingness to cooperate with government among people • Financial Environment ◈ Burden of requiring a huge budget and large-scale human resources
Statistical Environment ◈More likely to use the administrative Sources - Computerization of administrative records : Building registers, foreigner registration, etc. ◈Increasing number of daytime absent households ◈Difficult to collect data because of the increase of ageing people and one-person household ◈ Possible to use “Internet” for the-data collection - Internet access rate : 79.8% (2007.12)
3 Internet Survey Internet Survey of 2005 Census • Objectives • ◈ Decrease in coverage error • - Provide a way for hard-to-enumerate households • ◈ Low-cost data collection method • ◈ Introduced on the 2005 Census for the first time • Periods :2005.10.29~11.12 • Participation : 141,000 HHs (0.9% of total HHs)
Internet Confirmation of real name Ordinary citizens Input of questionnaire Credit rating agency Data encryption Input period SMS, Mail Confirmation of real name and address Input of questionnaire Confirmation of result Internet application Procedure of proceedings • Process of Internet survey
Lessons from Internet Survey ◈ Advantages - Provide a way for hard-to-enumerate households • •Post-enumeration net coverage error : • ※ 1.6% (2000 census) 0.9% (2005 census) • - Cost reduction in survey management and • enumerator’s employment
◈ Advantages - Improved data quality: interactive user guidance, automatic filtering of irrelevant survey items - Correspondence rates in the post-enumeration survey
◈ Challenges and problems for future census - Automatic address matching rate: 62.1% 37.9% respondents waited to receive the number of census tract - Difficulty in estimating appropriate system capacity • Difficulty in estimating peak time user frequencies
Plan for the Internet Survey of 2010 Census • ◈ Increase the participation ratio of Internet survey • - 2005 (0.9%) → 2010 (20~30%) • ◈ Introduce Internet survey participation number (ISPN) • - ISPN(12digits) : □□□ □□□ □□ □□ □□ • ◈ Intensify publicity campaign for Internet survey • ◈ Improve the Internet survey system • ◈ Provide incentive to the Internet survey respondents • ※Internet survey participation rates (1st pre-test in 2007) : 13.3%
4 e-Census System • Objectives ◈ Economic census with low cost, high efficiency ◈ Improvement of data quality • - Provide a way for hard-to-enumerate households • - Decrease in coverage error • ◈ Shortening data release time
Flow Chart of the e-Census System Census Tract Management Estimate No.of Enumerators Supply Management Recruit Management Payroll Management Education Management Cyber Education Field Survey Management Internet Survey, Housing DB Compiling a List of Households Preliminary Count of Census Web Based Data Input Analysis / Publication Data Editing/Tabulation
Function of e-Census System • ◈Survey Management • - Census tract management • - Supply management • - Education management, Cyber education • - Short messaging system - Survey result management
◈ Enumerator Management - Recruitment - Assignment of census tracts to enumerators - Payroll - Information of local officers (Name, Phone numbers, e-mail, etc) ◈ Internet Survey ◈ Web Based Data Input ◈ Data Editing and Tabulation ⇒The e-Census system will be used in 2010 census with a little changes
5 Data Capture and Editing • Punch card system : 1935~1970 • Key-board data entry : 1975~1985 • Optical marking reading : 1990~1995 • Key-board data entry : 2000 • Web-based data input system / ICR : 2005
5-1 Data Capture System of 2005 Census • Web-based data input system • ◈ Data capture for short form and long form • ◈ Decentralized to 256 cities/counties • ICR system • ◈ For special enumeration areas such as military camps • Internet system • ◈ For households participated on Internet survey
5-2 Web-Based Data Input System • Principle of 2005 data capture • ◈ Accurate and rapid data capture using e-Census system • ◈ e-Census : Unified system combined input, editing, • tabulation, and administration functions • Data input period • ◈ 2005. 11. 28. ~ 12. 22. (19 days) • No. of persons for data input and editing • ◈ 13,372 persons (including 614 managers)
Process of data input and editing • ◈ Data input and editing through on-line system • ◈ Data input and editing by an enumeration district. • - Able to generate error messages following data input • by enumeration districts • ◈ Integrated editing of Internet survey and key-board entry data • ◈ Input error rates : 0.19%
5.3 ICR System • Data capture for special EDs • ◈ No. of questionnaires : 130 thousand • ICR system • ◈ 3 scanners, 1 storage, 12 servers, • 40 pcs for correction of errors and editing • ◈ 1 scanner per 2 persons • Scanning periods : 2005.11.21~12.2 (10 days)
ICR system Server PC for scanner PC for recognition of images Correction of recognition errors Scanner
Recognition rates of numbers (%) • Recognition rates of characters : 76.73% ◈ Recognition rates lower than pre-test’s rates (86~90%) - Lower rates were a results of using unsuitable pen
5.4 Data Capture for Non-response HHs • No. of non-response HHs (2005) : 75,000 HHs • ◈ 0.47% of total Households (HHs) • ◈ Collect basic information of HHs • - EX. : No. of HH members, sex, age, type of housing • Imputation for non-response HHs ◈ Hierarchical Hot-deck Method • ◈ Probability Hot-deck Method • ◈ Deductive methods
Imputation for missing data • ◈ Hierarchical Hot-deck Method • - Used data mining technique to make Hot-deck tables • - Ex. : Hot-deck table for total floor space • * Column of table : No. of households • * Low of table : Type of housing, No. of room • ◈ Deductive methods • - Ex. : Education of children under 5 → No schooling
5.5 Auto-coding for Industry/Occupation • Process of auto-coding • ◈ Coding according to the code from industrial census • ◈ Coding by matching with coding case dictionary • ◈ Selection of code by coders among 3 suggestedcodes • by auto-coding system • Periods of auto-coding (2005) ◈ Coding for industry : 2,030,000 cases - Auto coding (63.0%) : 2005.12.28~12.30 - Selection of codes (36.7%) : 2006.1.4~2.10 - Unable to code (0.3%)