1 / 23

ICES III Montreal, June 18-21, 2007

A new Approach for Disclosure Control in the IAB Establishment Panel Multiple Imputation for Better Data Access. ICES III Montreal, June 18-21, 2007. Jörg Drechsler Institute for Employment Research (IAB). Overview. Background

saddam
Télécharger la présentation

ICES III Montreal, June 18-21, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A new Approach for Disclosure Control in the IAB Establishment Panel Multiple Imputation for Better Data Access ICES III Montreal, June 18-21, 2007 Jörg Drechsler Institute for Employment Research (IAB)

  2. Overview • Background • Statistical disclosure control with fully synthetic data sets • Application to the IAB-Establishment Panel • First results • Proceedings/open questions

  3. The IAB Establishment Panel • Annually conducted Establishment Survey • Since 1993 in Western Germany, since 1996 in Eastern Germany • Population: All establishments with at least one employee covered by social security • Source: Official Employment Statistics • Response rate of repeatedly interviewed establishments more than 80% • Sample of more than 16.000 establishments in the last wave • Contents: employment structure, changes in employment, business policies, investment, training, remuneration, working hours, collective wage agreements, works councils

  4. Overview • Background • Statistical disclosure control with fully synthetic data sets • Application to the IAB-Establishment Panel • First results • Proceedings/open questions

  5. Generating Synthetic Data Sets (Rubin 1993) • Advantages: - Data are fully synthetic • - no re-identification of single units possible - all variables are still fully available X Ynot observed Ysynthetisch Ysynthetisch Ysynthetisch Ysynthetisch Ysynthetic Yobserved

  6. Overview • Background • Statistical disclosure control with fully synthetic data sets • Application to the IAB-Establishment Panel • First results • Proceedings/open questions

  7. Generating synthetic data sets for the IAB Establishment Panel • Create a synthetic data set for selected variables from the wave 1997 from the Establishment Panel • Imputation for the whole population is not feasible • Draw a new sample from the Official Employment Statistics using the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region) • Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel • Additional Information from the German Social Security Data (GSSD) for the imputation

  8. The German Social Security Data (GSSD) • Contains information on all employees covered by social security • Since 1973 all employers are required to notify the social security agencies about all employees covered by social security. • The GSSD represents about 80% of the German workforce • Information from the GSSD is aggregated on the establishment level and is matched to the IAB Establishment Panel via establishment identification number • Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…

  9. Ysynthetisch Ysynthetisch Ysynthetisch Ysynthetisch Synthetic Establishment Panels GSSD EPsynthetic The IAB Establishment Panel

  10. Imputation Procedure • For simplicity new founded establishments are excluded from the sampling frame and from the panel • 10 new samples are drawn • The number of observations in each sample equals the number of observations in the panel ns=np=7332 • Every sample is imputed ten times using chained equations • Number of variables from the GSSD: 24 • Number of variables from the establishment panel: 48 • Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)

  11. Overview • Background • Statistical disclosure control with fully synthetic data sets • Application to the IAB-Establishment Panel • First results • Proceedings/open questions

  12. First Results • Compare regression results from the original data with results from the synthetic data • Zwick (2005) analyses the productivity effects of different continuing vocational training forms in Germany • Results: vocational training is one of the most important measures to gain and keep productivity • Probit regression to explain, why firms offer vocational training • 13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected… • 2 variables, based on the 1998 wave of the panel, are dropped for the evaluation

  13. Descriptive comparison of the original and in the synthetic data set

  14. Results from the regression *** significant on the 0.1% level, ** significant on the 1% level, * significant on the 5% level

  15. Overview • Background • Statistical disclosure control with fully synthetic data sets • Application to the IAB-Establishment Panel • First results • Proceedings/open questions

  16. Proceedings/open questions • More detailed evaluation • Replace only selected variables • Generate weights for the synthetic sample • Imputation of more than one wave maintaining the panel structure References • Drechsler, J., Dundler, A., Bender, S., Rässler, S., Zwick, T. (2007). A New Approach for Disclosure Control in the IAB Establishment Panel - Multiple Imputation for a Better Data Access, IAB Discussion Paper No.11/2007 • Reiter, J. und Drechsler, J. (2007). Releasing Multiply-Imputed, Synthetic Data Generated in Two Stages To Protect Confidentiality, submitted

  17. Thank you for your attention

  18. Information from the two data sets

  19. Disclosure is possible, if… • An establishment is included in the original data set and in at least on of the newly drawn samples • The original values and the imputed values for this establishment are nearly the same

  20. How often are establishments included in the IAB-Establishment Panel drawn in the new samples?

  21. Comparing original and imputed values • Binary variables: probability of identical values: 60-90% • Multiple response questions: - with four categories: 57% - with 13 categories: 6% • Numerical variables: • - average relative difference: 21% - outliers

More Related