150 likes | 282 Vues
ESSNET on Statistical Disclosure Control. Daniela Ichim. ESSNET SDC Record linkage and SDC Statistical matching and SDC. Outline. Pilot ESSnet, 2008-2009 12 Participants: CBS (coordinator), Istat, Destatis, ONS, Statistics Sweeden, Statistics Austria, Statistics Norway, Portugal INE, ….
E N D
ESSNET onStatistical Disclosure Control Daniela Ichim
ESSNET SDC Record linkage and SDC Statistical matching and SDC Outline
Pilot ESSnet, 2008-2009 12 Participants: CBS (coordinator), Istat, Destatis, ONS, Statistics Sweeden, Statistics Austria, Statistics Norway, Portugal INE, …. 3 sub-contractors: University Rovira I Virgili , University of Naples, IAB Germany Web-site: http://neon.vb.cbs.nl/casc/ ESSNET SDC
4rd Framework SDC-project (1996-1998) 5th Framework CASC project (2000-2003) CENEX project (2006) Aim: enhance the development in the field of statistical confidentiality 1. methodological 2. software 3. practice, practice, practice, … Before ESSNET SDC
Outputs: Argus software Handbook on SDC Conferences (PSD) Methodological papers web-site International journals Before ESSNET SDC
Main goal: raise the level of knowledge and skills to a higher level promotion of the results achieved so far make SDC tools more easily applicable Involvement of “new” NSIs Coordination at ESS level Main outputs: Improved versions of Argus/handbook Dissemination Training courses Reports and case studies ESSNET SDC
Link: MICRODATA SDC: measure the disclosure risk release of microdata files (PUF, MFR) Record linkage and SDC
Assumptions: The intruder has access to an external register (E) E covers the whole population E and D share a set of (key) variables, measured without error The intruder uses record linkage to match a unit in the sample to one in the population using only the key variables … Risk Measures: Number of “linked” units Probability of correct identification = Probability of correct linkage I. Standard disclosure scenario
Distance-based RL (Domingo-Ferrer) linking each record d in file D to its nearest record e in file E Mainly for continuous variables (business data) Probabilistic RL (Skinner) Classical framework Mainly for categorical variables (social data) I. RL used in SDC
QUALITY External register Coverage Misclassification errors Which variables? Which registers? ... Disseminated microdata file Misclassification errors (known pattern, known protection parameters, etc.) Usage (in RL) of the publicly available information: Sampling design (stratification, survey weights) Known population characteristics (M/F) Hierarchical file structure (HH, enterprise-local unit) Ideal (worst) case: true whole population – a (unique) correct link exists …. I. RL and Risk
Integrate THEN Disseminate Grant access to composite microdata covering a wider range of variables More careful management of the risks of disclosure (+ the previous slide + the increased confidentiality/sensitivity of integrated data sets) Impact on analyses II. RL and Release
Statistical Matching and SDC (Y,X) (X,Z) X (Y,X,Z)
Statistical Matching and SDC Frechèt Bounds
How to use a released microdata file in a statistical matching procedure? Issues: Use protection/perturbation information to improve the statistical matching performance Impact on statistical analyses. Statistical Matching and Release
Conclusions • Change (improve/adapt) the DI process to account for microdata files with (some) known properties • Change (improve/adapt) the SDC process to account for the latest methodological and technological DI developments • PRACTICE Step-by-step approach!!!