120 likes | 231 Vues
This document outlines the development and implementation of selective data editing at Statistics Sweden (SCB) based on case studies from 2010. Focusing on the Lotta project and decentralized production, it explores the costly impact of data errors, accounting for 33% of budgets. The purpose of this project was to assess the potential gains of selective data editing and the feasibility of a common tool for data collection departments. The methodology involves flagging suspicious variables and calculating potential impact to enhance data quality.
E N D
Selective data editingDevelopment & implementation Q 2010 Helsinki Jörgen Svensson Process Owner Statistics Sweden (SCB)
Standardizationat SCB • Decentralized production • Development of CBM:s • Editing costly, 33% of budgets • Data collection departments, 2006 • Standardization – the Lotta project, in 2006 2
Nine case studies • Purpose of the project: • Try using selective data editing • What is the potential gain using the method? • Would it be possible to develop and use a common tool?
SUSPICION • SUSP(j, k) = Suspicion of variable j for unit k • SUSP(j, k) = 0 if variable value falls within acceptance interval • SUSP(j, k) → 1 as value deviates from acceptance limit • 0 ≤ SUSP(j,k) ≤ 1
POTENTIAL IMPACT • POTIMP = Potential impact • POTIMP is weighted absolute difference between observed and predicted value : • POTIMP(j ,k,d) = for variable j, unit k in domain d wk is sampling weight, k(d) is domain indicator • SELEKT supports several ways to establish predicted value: from time series data and from cross sectional analysis within homogenous groups of units
Flagging suspected errors log(Potentialimpact) Flagged log(Suspicion) 20
LOCAL SCORE Local (item) score LScore (j,k,d): LScore (j,k,d) = SUSP(j,k)*|POTIMP(j,k,d)|*Cello(j,d) Cello(j,d) is inversely proportional to the standard error based on previous data
GLOBAL SCORE • Global (unit) score GScore(k) is obtained by aggregation of local scores • LScore (k, j, d) → LScore (k , j) → GScore(k) • → = Summation , Euclidian Summation or Maximum • Only those units with GScore larger than a pre-decided threshold are followed up
Implementation of selekt So far three surveys: • Business activity indicators • Wage & salary structures in the private sector • Commodity flow survey 11
Documentation A General Methodology for Selective Data Editing jorgen.svensson@scb.se anders.norberg@scb.se 12