100 likes | 208 Vues
This document outlines the comprehensive data integration and confidentiality measures employed in processing global census data. It covers essential aspects such as data pre-processing, standardization, and the construction of variables necessary for the analysis. Additionally, it highlights methods to protect individual privacy, such as case swapping, geographic suppression, and coding techniques to prevent identification of small localities. The document also includes harmonization of coding for various demographic variables across multiple countries, ensuring consistency and accuracy in the census data.
E N D
IPUMS-InternationalIntegration Process Matt SobekMinnesota Population Centersobek@umn.edu
Input material Pre-processing Standardization Integration Data files Reformat data Donation Draw sample Confidentiality Code clean-up Verify data Harmonize codes Variable programming Constructed variables GIS boundary files Data dictionary Questionnaires Enum instructions Sample information Translate to English Images to editable files Ipums data dictionary Tag enumeration text Document sourcevariables Variable descriptions Sample design
End Matt SobekMinnesota Population Centersobek@umn.edu
Confidentiality Measures • Swap a small percentage of cases between geographic areas. • Suppress low-level geographic variables. • Recode geographic units to ensure small localities cannot be identified (typically those with fewer than 20,000 persons). • For recent censuses: • Recode cells representing very small numbers of persons in the population (into a residual or combined with a larger category). • Top- or bottom-code continuous variables with a thin tail. • Suppress specific categories of variables as requested by the NSO. • Suppress entire variables as requested by the NSO.
Harmonize Codes: Translation Matrix for Marital Status China 1982 Colombia 1973 Kenya 1989 Mexico 1970 U.S.A. 1990
Constructed “Pointer” Variables (Simple household) Spouse’s 2 1 0 0 0 0 Mother’s Father’s 0 0 0 0 0 0 2 1 2 1 2 1 (Colombia 1985)
Census Questionnaire Image (Mexico 2000) Water Access
XML-Tagged Census Questionnaire (Mexico 2000) Water access