310 likes | 341 Vues
IPUMS-Europe, 2004-2008: Restricted-access, anonymized microdata for scientific and policy research * * * Robert McCaa, University of Minnesota Population Center Nikolai Botev, UN-ECE Population Activities Unit (Geneva) www.hist.umn.edu/~rmccaa/ipums-europe. Outline. PAU 1990s project
E N D
IPUMS-Europe, 2004-2008: Restricted-access, anonymized microdata for scientific and policy research* * *Robert McCaa, University of Minnesota Population CenterNikolai Botev, UN-ECE Population Activities Unit (Geneva)www.hist.umn.edu/~rmccaa/ipums-europe hist.umn.edu/~rmccaa/ipums-europe
Outline • PAU 1990s project • IPUMS-International means: Restricted access, anonymized microdata • IPUMS-Europe: sister project (Latin America), connections with PAU • IPUMS-International partners • Principles: integration, dissemination hist.umn.edu/~rmccaa/ipums-europe
Population Activities Unit 1990 census round harmonization project:focused on Aging • Begun 1992: PAU/UNECE, UNFPA, US-NIA • Microdata acquired for 15 countries • Harmonized 26 core person variables plus 13 optional; 10 dwelling/household variables, 18 optional • Extensive metadata: questionnaires, nomenclatures, classifications • Progressive over-sampling with age hist.umn.edu/~rmccaa/ipums-europe
Population Activities Unit 1990 census round harmonization project:focused on Aging hist.umn.edu/~rmccaa/ipums-europe
Population Activities Unit, 1990 census round harmonization project:focused on Aging • General release: samples for 8 countries • Samples for the other 7 countries available under more restrictive conditions • Dissemination: CDs or other media; no online access • Sustainability: ICPSR (U. of Michigan) hist.umn.edu/~rmccaa/ipums-europe
Problems with PAU effort: • Sample design too complex • Need for time series • Lacked legal authority • Inadequate funding • Insufficient computing infrastructure and human resources • Antiquated distribution system • Sustainability problematic hist.umn.edu/~rmccaa/ipums-europe
Population Activities Unit: samples of older persons based on the 2000-round of censuses • Tightly integrated with IPUMS-Europe • Based on the same coding schemes, nomenclatures, and classifications • Utilize the same anonymization techniques and approaches; same data access modalities • Ensure sustainability through the integration with IPUMS-Europe: ICPSR & European Data Centers hist.umn.edu/~rmccaa/ipums-europe
Population Activities Unit: samples of older persons based on the 2000-round of censuses • Sample design:- sample of households not included in the core IPUMS-Europe sample, where at least one member is over age 60 (recommended sampling density: 5 percent);- geography to match that of core samples; • Advantages:- more straightforward than the design used for 1990s;- in line with the practice of national statistical offices (e.g. PUMS-A and PUMS-O of the US Census Bureau); hist.umn.edu/~rmccaa/ipums-europe
From IPUMS-USA (1989-) & PAU-Aging (1992-) to IPUMS-International (1999-) and beyond to IPUMS-International (1999-), Latin America (2003-), Europe (2004?) and beyond Restricted access Anonymized microdata hist.umn.edu/~rmccaa/ipums-europe
IPUMS-International means Restricted access, Anonymized microdata • Should be “IRAMS” not IPUMS • Who are IPUMS-International users? Those who: • Have a demonstrated need for the data (project abstract) • Agree to abide by the restrictions of use • Place themselves under the jurisdiction of Institutional Review Boards hist.umn.edu/~rmccaa/ipums-europe
IPUMSi Using the most demanding standards:legal & administrative ANONYMIZES as well as technical: » Suppress geographical detail (NUTS2/3?)» Corrupt the data! (just a little…)» Blur/aggregate sensitive codes» Convert dates to ages (blur key vars.) » Swap cases between districts! (just a few…)» Scramble order of unit records hist.umn.edu/~rmccaa/ipums-europe
Anonymization example: Italy, 1991First assessmentNote: population uniques are anonymized after integration • 1. Suppress geographical variables below commune • 2. Convert • Dates of birth, marriage, immigration to ages • Band small groups • 3. Suppress sensitive codes for small groups: • Citizenship • Year of immigration to Italy • Commune of work/study hist.umn.edu/~rmccaa/ipums-europe
EUROSTAT statistical anonymity standards(Thorogood, 1999)--all accepted by IPUMS-International • 1. small sample size • 2. limited geographical detail • 3. top and bottom coding of unique categories • 4. signed non-disclosure agreement • 5. prohibit redistribution of datasets to third parties • 6. prohibit attempts to identify individuals or the making of any claim to that affect • 7. require users to provide copies of publications hist.umn.edu/~rmccaa/ipums-europe
EUROSTAT statistical anonymity standards(Thorogood, 1999)--all accepted by IPUMSi and more • 8. Age (constructed from birth date, where necessary) • 9. Never identify date of birth • 10. Never identify place of birth • 11. Migration: timing and place not identified in detail • 12. Place of residence identified by major civil division (pop>60k, 120k, 250k, 1 million--national rule) • 13. Sensitivity analysis of variables by national experts • 14. Confidentiality assessment by national experts hist.umn.edu/~rmccaa/ipums-europe
Funded! Sister-project: IPUMS-Latin America: 17 countries, ~500 million pop., 5 census rounds80+ samples, 100+ million person records • Scope: Latin Americancensus microdata, 1960-present • Work Plan ( funded by National Institutes of Health) • 2001: Sign licensing agreements with official agencies • 2002: Obtain funding from U.S. NIH • 2003: Develop/translate microdata & metadata • 2004: Country expert teams design national integrations • 2005: MPC/expert teams design regional integration • 2006: MPC anonymizes/integrates microdata and metadata • 2007: MPC disseminates to bona fide researchers who sign non-disclosure license. National census/data/research institutes may distribute national versions via CDs/web. hist.umn.edu/~rmccaa/ipums-europe
IPUMS-Europe Partnership: More… • Censuses: 1960s – 2000, where microdata exist • Countries: >350 million population, 16, inclined at present (* = signed): Austria, Bulgaria, Czech Republic*, France*, Germany, Greece, Ireland, Israel, Hungary*, Poland, Portugal, Romania, Slovenia*, Spain*, Switzerland, Turkey • Research: more knowledge, more users hist.umn.edu/~rmccaa/ipums-europe
IPUMS-Europe Partnership: More uniformity… • Legal: signed memorandum of understanding • Administrative: restricted to approved users; strong enforcement procedures • Sample design: every nth household • Anonymization: includes corrupting data • Integration: more variables, composite coding • Dissemination: extract custom-tailored datasets, never entire samples hist.umn.edu/~rmccaa/ipums-europe
Advantages…proven record of accomplishments: • Uniform legal protocols • Substantial institutional infrastructure • Experienced census microdata integrators • Cost-effective academic environment • Sustained funding from National Science Foundation, National Institutes of Health • Successful web-based distribution system: users! hist.umn.edu/~rmccaa/ipums-europe
Advantages of IPUMS-International • Comparability: data are rigorously integrated; documentation is extensive, both primary (from NSIs) and integrated (from MPC) • Accountability: reports on users, usage and publications advisory board of statisticians and scientists • Sustainability: MPC, ICPSR hist.umn.edu/~rmccaa/ipums-europe
IPUMS-Europe, 2004-2008: coverage~20 countries, representing ~400m. people • Scope: Europeancensus microdata, 1950-present • Work Plan (contingent upon funding) • 2003: Sign licensing agreements with census agencies Obtain funding from US NIH • 2004: Develop/translate microdata & metadata • 2005: Country expert teams design national integrations • 2006: MPC/expert teams design regional integration • 2007: MPC integrates microdata and metadata • 2008: MPC disseminates to bona fide researchers who sign non-disclosure license. National census/data/research institutes via CDs/web. hist.umn.edu/~rmccaa/ipums-europe
IPUMS Imagine a new statistical product: scientifically anonymized, integrated census microdata samples made up of unidentifiable individuals... INTERNATIONAL » Easy-to-use web-interface» Highest scientific standards» Proven, powerful integration» A quantum leap in usage » 1998: 1 country signed» 1999: 3 countries» 2000: 9 » 2001: 15 » 2002: 32; first release, 6 countries hist.umn.edu/~rmccaa/ipums-europe
IPUMSi RESCUES UN Demographic Center for Latin America (CELADE, Santiago, Chile)~3000 microdata tapes recovered and metadata (documentation) hist.umn.edu/~rmccaa/ipums-europe
IPUMSi National experts in each country are contracted to assist with: PAYS • Assembling microdata and documentation • Developing samples • to minimize confidentiality risks • and to maximize robustness • Designing national integration plan • census-by-census • concept-by-concept • code-by-code • Writing integrated documentation hist.umn.edu/~rmccaa/ipums-europe
IPUMSi PARTNERSHIP Census documentation compiled for Colombian microdata Standard:UN/Eurostat Principles & Recs... Photos from Colombia integration project, February-March, 2000:4 experts from DANE (census office)+7 academics (3 universities) hist.umn.edu/~rmccaa/ipums-europe
IPUMSi integration principles • 1. Respect absolute anonymity and confidentiality • 2. Preserve all original data, except adjustments to insure privacy (top codes, blurrings, masking, re-ordering, etc.) • 3. Harmonize codes using international standardsoccupation: ISCO-88 (detailed, general)education: ISCED “ “family: IPUMS, etc. “ “ • 4. Enhance with constructed variables hist.umn.edu/~rmccaa/ipums-europe
Composite coding scheme example:marital status hist.umn.edu/~rmccaa/ipums-europe
Occupation: the ISCO standard, preliminary release: “1” digitfinal: 2-3 or 4 digit, depending upon country hist.umn.edu/~rmccaa/ipums-europe
Variable availability, preliminary release hist.umn.edu/~rmccaa/ipums-europe
IPUMSi Web-based extraction system DISSEMINATES Legally-binding license agreement • protects privacy and confidentiality • assures proper use • new sanction: loss of employment. Researcher selects • countries • censuses • cases/sub-populations • variables • sample densities • Facilitates comparative research hist.umn.edu/~rmccaa/ipums-europe
Can we do it?? Yes we can!!! additional information at:www.hist.umn.edu/~rmccaa/ipums-europecontact:rmccaa@umn.edu * * * * *Thank you hist.umn.edu/~rmccaa/ipums-europe
IPUMS-Europe, 2004-2008: coverage~20 countries, representing ~400m. people • Scope: Europeancensus microdata, 1950-present • Work Plan (contingent upon funding) • 2003: Sign licensing agreements with census agencies Obtain funding from US NIH • 2004: Develop/translate microdata & metadata • 2005: Country expert teams design national integrations • 2006: MPC/expert teams design regional integration • 2007: MPC integrates microdata and metadata • 2008: MPC disseminates to bona fide researchers who sign non-disclosure license. National census/data/research institutes via CDs/web. hist.umn.edu/~rmccaa/ipums-europe