1 / 26

IAOS 2014 Conference – Meeting the Demands of a Changing World

IAOS 2014 Conference – Meeting the Demands of a Changing World Da Nang , Vietnam , 8-10 October 201 4. Diagnosing the Imputation of Missing Values in Official Economic Statistics via Multiple Imputation: Unveiling the Invisible Missing Values. National Statistics Center (Japan)

nora
Télécharger la présentation

IAOS 2014 Conference – Meeting the Demands of a Changing World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IAOS 2014 Conference – Meeting the Demands of a Changing World DaNang, Vietnam, 8-10 October 2014 Diagnosing the Imputation of Missing Values in Official Economic Statistics via Multiple Imputation: Unveiling the Invisible Missing Values • National Statistics Center (Japan) • Masayoshi Takahashi Notes: The views and opinions expressed in this presentation are the authors’ own, not necessarily those of the institution.

  2. Outline • Problems of Missing Values and Imputation • Theory of MI and the EMB Algorithm • Mechanism Behind the Diagnostic Algorithm • Data and Missing Mechanism • Assessment of the Diagnostic Algorithm • Conclusions and Future Work

  3. Problems of Missing Values 1. Problems of Missing Values and Imputation • Prevalence of missing values • Effects of missing values • Reduction in efficiency • Introduction of bias • Assumptions and solution • Missing At Random (MAR) • Imputation

  4. Problematic Nature of Single Imputation (SI) 1. Problems of Missing Values and Imputation Deterministic SI ^ = OLS estimate There is only one set of regression coefficients. Stochastic SI Random noise

  5. Multiple Imputation (MI) Comes for Rescue 2. Theory of Multiple Imputation and the EMB Algorithm ~ = random sampling from a posterior distribution Multiple sets of regression coefficients Need multiple values of

  6. Likelihood of Observed Data 2. Theory of Multiple Imputation and the EMB Algorithm Random sampling from observed likelihood Not easy!! • Solution • Various computation algorithms

  7. Computational Algorithms 2. Theory of Multiple Imputation and the EMB Algorithm • EMB algorithm • Expectation-Maximization • Bootstrapping • Most computationally efficient • Other MI algorithms • MCMC • FCS

  8. Graphical Presentation of the EMB Algorithm 2. Theory of Multiple Imputation and the EMB Algorithm

  9. Paradox in Imputation 3. Mechanism Behind the Diagnostic Algorithm • Imputed values • Estimates, not true values • Diagnosis • True values • Always missing • Cannot compare the imputed values with the truth • How do we go about imputation diagnostics?

  10. Solution to the Paradox 3. Mechanism Behind the Diagnostic Algorithm • Indirect diagnosticsof imputation • Abayomi, Gelman, and Levy (2008) • Honaker and King (2010) • MI • Within-imputation variance • Between-imputation variance

  11. Disadvantageof multiple imputation 3. Mechanism Behind the Diagnostic Algorithm • Dozens of imputed datasets • Computational burden • Multiple values for one cell • Unrealistic to directly use in official statistics

  12. Proposal in this Research 3. Mechanism Behind the Diagnostic Algorithm • Two-step procedure • Imputation step: Stochastic SI • Diagnostic step: MI • Advantage • Can have only one imputed value • Advantage of SI • Can know the confidence about each imputed value • Advantage of MI New!!

  13. Multiple Imputation as a Diagnostic Tool 3. Mechanism Behind the Diagnostic Algorithm • Variation among M imputed datasets • Estimation uncertainty in imputation • Our diagnostic algorithm • Utilizes this variability • Can examine the stability & confidence of imputation models • What does this mean? • See the next slide for illustration

  14. Illustration: Two Cases of Variation in Imputations 3. Mechanism Behind the Diagnostic Algorithm

  15. Mathematical Representation 3. Mechanism Behind the Diagnostic Algorithm Imputation Step: Stochastic SI If , then no uncertainties Diagnostic Step: MI What we actually check is whether

  16. Data 4. Data and Missing Mechanism • Multivariate log-normal distribution • Mean vector & variance-covariance matrix • Simulated dataset • Manufacturing Sector • 2012 Japanese Economic Census • Number of observations • 1,000 • Variables • turnover, capital, worker

  17. Missing Mechanism 4. Data and Missing Mechanism • Target variable • turnover • Missing rate • 20% • Missing mechanism • MAR • A logistic regression to estimate the probability of missingness according to the values of explanatory variables (capital and worker)

  18. R-Function diagimpute 5. Assessment of the Diagnostic Algorithm • New function developed in R • Graphical detection of problematic imputations as outliers • Graphical presentation of the stability of imputation via control chart • Not yet publicly available • A work in progress • Once finalized, planning to make it publicly available

  19. Preliminary Result 1 5. Assessment of the Diagnostic Algorithm

  20. Preliminary Result 2 5. Assessment of the Diagnostic Algorithm

  21. Conclusions 6. Conclusions and Future Work • MI as a diagnostic tool • A novel way • Diagnostic algorithm • Still a work in progress • A preliminary assessment given • Useful to detect problematic imputations • Help us strengthen the validness of official economic statistics.

  22. Future Work 6. Conclusions and Future Work • Intend to further refine the algorithm • Test it against a variety of real datasets • Use several imputation models

  23. References 1 • Abayomi, Kobi, Andrew Gelman, and Marc Levy. (2008). “Diagnostics for Multivariate Imputations,” Applied Statistics vol.57, no.3, pp.273-291. • Allison, Paul D. (2002). Missing Data. CA: Sage Publications. • Congdon, Peter. (2006). Bayesian Statistical Modelling, Second Edition. West Sussex: John Wiley & Sons Ltd. • de Waal, Ton, JeroenPannekoek, and Sander Scholtus. (2011). Handbook of Statistical Data Editing and Imputation. Hoboken, NJ: John Wiley & Sons. • Honaker, James and Gary King. (2010). “What to do About Missing Values in Time Series Cross-Section Data,” American Journal of Political Science vol.54, no.2, pp.561–581. • Honaker, James, Gary King, and Matthew Blackwell. (2011). “Amelia II: A Program for Missing Data,” Journal of Statistical Software vol.45, no.7. • King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. (2001). “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation,” American Political Science Review vol.95, no.1, pp.49-69. • Little, Roderick J. A. and Donald B. Rubin. (2002). Statistical Analysis with Missing Data, Second Edition. New Jersey: John Wiley & Sons.

  24. References 2 • Oakland, John S. and Roy F. Followell. (1990). Statistical Process Control: A Practical Guide. Oxford: Heinemann Newnes. • Rubin, Donald B. (1978). “Multiple Imputations in Sample Surveys — A Phenomenological Bayesian Approach to Nonresponse,” Proceedings of the Survey Research Methods Section, American Statistical Association, pp.20-34. • Rubin, Donald B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons. • Schafer, Joseph L. (1997). Analysis of Incomplete Multivariate Data. London: Chapman & Hall/CRC. • Scrucca, Luca. (2014). “Package qcc: Quality Control Charts,” http://cran.r-project.org/web/packages/qcc/qcc.pdf. • Statistics Bureau of Japan. (2012). “Economic Census for Business Activity,” http://www.stat.go.jp/english/data/e-census/2012/index.htm. • Takahashi, Masayoshi and Takayuki Ito. (2012). “Multiple Imputation of Turnover in EDINET Data: Toward the Improvement of Imputation for the Economic Census,” Work Session on Statistical Data Editing, UNECE, Oslo, Norway, September 24-26, 2012.

  25. References 3 • Takahashi, Masayoshi and Takayuki Ito. (2013). “Multiple Imputation of Missing Values in Economic Surveys: Comparison of Competing Algorithms,” Proceedings of the 59th World Statistics Congress of the International Statistical Institute, Hong Kong, China, August 25-30, 2013, pp.3240-3245. • Takahashi, Masayoshi. (2014a). “An Assessment of Automatic Editing via the Contamination Model and Multiple Imputation,” Work Session on Statistical Data Editing, United Nations Economic Commission for Europe, Paris, France, April 28-30, 2014. • Takahashi, Masayoshi. (2014b). “Keiryouchi Data no Kanrizu (Control Chart for Continuous Data),” Excel de HajimeruKeizaiToukei Data no Bunseki (Statistical Data Analysis for Economists Using Excel) , 3rd edition. Tokyo: ZaidanHoujin Nihon ToukeiKyoukai.. • van Buuren, Stef. (2012). Flexible Imputation of Missing Data. London: Chapman & Hall/CRC.

  26. Thank you

More Related