1 / 7

An automated comparison of statistics

An automated comparison of statistics. Introduction Result of the automated comparison Overview of the methodology Realisation and practical experiences Conclusions. Introduction. S ignals / soft checks S uspicious values may be erroneously accepted

Télécharger la présentation

An automated comparison of statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An automated comparison of statistics Introduction Result of the automated comparison Overview of the methodology Realisation and practical experiences Conclusions

  2. Introduction • Signals / soft checks • Suspicious values may be erroneously accepted • Due to scarce resources more and more suspicious values are accepted -> biased statistics • Manual check of actual statistics with respective previous ones outside of the data editing process • Requires … • data transfer and tabulation • extensive experience and subject matter knowledge • No information about inducing records available • Strenuous work that consumes a lot of resources • No guarantee to discover all inconsistencies

  3. Result of the automated comparison Analysis of respective records Comparison of statistics Categorisation of the statistics Euclidean distances of the same checked variables Checked variables per statistic Record identifier Normed weights Relevance indicator Error indicator per statistic Error indicator Flag Flag

  4. Overview of the methodology • Basic idea: Aggregate method • Remove structural effects, influence of the business cycle • Create an actual robust dataset • Principal component analysis (PCA) of the robust actual dataset and the dataset of the previous reporting period • Sum of PC on the basis of the actual dataset and the robust loadings and sum of PC on the basis of the previous data • Statistics: (Weight and) sum up PC, compute differences and flag them on the basis of Nalimov-Test • Records: Euclidean Distances of the sum of PC (error indicator) (and weighting -> relevance indicator), flagging error indicator on the basis of the Nalimov-Test

  5. Realisation and practical experience • Realisation • Pascal Avieny, pascal.avieny@destatis.de • SAS macros with English comments, SAS 9.2, STAT • Requirements: two datasets, identical variables to be compared, ID-variable • Practical experience • Wholesale trade, SBS 2010 versus SBS 2009 • Variables: turnover, gross profit, costs for personnel, value added minus gross profit, working places • Final check at the end of the data editing process • Results • 945 of nearly 12.500 records flagged • Relation between employees and turnover too week • Around 70 records of the ones with the 122 biggest error scores corrected • Review of the relationship between turnover and expansion weight required (now included)

  6. Conclusions • High error indicators corresponded to signals / soft checks that were erroneously confirmed in many cases • Decision on correcting signals / soft checks should be done on the basis of the comparison • Saving resources – even in the case of complex surveys, no automated data editing on one hand and the requirement to disseminate micro data on the other • Methodology • Verification of the high number of flags • Higher hit rate by computing the principal components on the basis of robust covariances? • Use of the dimensional reduction functionality based on PCA

  7. YOU ARE WELCOME! Elmar Wein Telephone: +49/(0) 611 / 75 3128 elmar.wein@destatis.de www.destatis.de

More Related