190 likes | 207 Vues
A handbook on validation methodology Marco Di Zio Istat. Workshop ValiDat Foundation – Wiesbaden, 10-11 November 2015. Underlying idea of the HB. Why a handbook on methodology for data validation? Standardization of language, of elements, provide common measures for evaluation…
 
                
                E N D
A handbook on validation methodologyMarco Di ZioIstat Workshop ValiDat Foundation – Wiesbaden, 10-11 November 2015
Underlying idea of the HB • Why a handbook on methodology for data validation? • Standardization of language, of elements, provide common measures for evaluation… • establish a common reference framework and develop metrics for evaluating DV • The HB is composed of two main parts: • A generic framework for data validation • Discuss metrics to evaluate a validation procedure (tuning, evaluating the procedure..) ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Generic framework for data validation The objective of this first section is to clarify • What • Why • How and … ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Generic framework for data validation • Clearly establish the relation with other phases of the statistical production process and internationals standards as • GSBPM • GSDEMs • GSIM • Describe the data validation life cycle – useful for managing the data validation process ValiDat foundation workshop - Wiesbaden 10-11 November 2015
What is data validation… Definition • Data Validation is an activity verifying whether or not a combination of values is a member of a set of acceptable combinations. • not far from the Unece definition: An activity aimed at verifying whether the value of a data itemcomes from the given (finite or infinite) set of acceptable values • but essentially different… ValiDat foundation workshop - Wiesbaden 10-11 November 2015
What… • It is a decisional procedure ending with an acceptance or refusal of data as acceptable. • The decisional procedure is generally based on rules expressing the acceptable combinations of values. ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Why do we perform data validation… • The purpose of data validation is to ensure a certain level of quality of the final data • but quality has several aspects. We clarified which aspects are related to DV • Essentially the ones related the ‘structure of the data’, that are accuracy, comparability, coherence. • But others are connected, e.g., timelines can be seen as a constraining factor ValiDat foundation workshop - Wiesbaden 10-11 November 2015
How to perform DV… Two main elements • Validation levels • to what extent a data set has been validated • Validation rules • Rules are applied to data, a failure of the rule implies that the corresponding validation level is not attained by the data at hand (decisional process: accept/not accept) ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Validation levels They are related to the perspective of the ‘validator’ … In the HB: • Business perspective • Starting form the elements characterising usually the DV process (increasing information) • A formal approach • Looking a the elements characterizing a point in a statistical setting ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Validation levels: business perspective ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Validation levels: formal approach metadata aspects that are necessary to identify a data point, • The universe U from which a statistical object originates. (e.g., household, company,) • The time t of selecting an element u from the current population p(t) • The selected element u. This determines the value of variables X over time that may be observed. • The variable selected for measurement. ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Data validation - GSDEMs • Generic Statistical Data Editing Models • statistical data editing composed of three different function types: Review, Selection and Amendment • The review functions are defined as: Functionsthatexamine the data to identifypotentialproblems. Thismay be by evaluatingformallyspecifiedqualitymeasures or editrules or by assessing the plausibility of the data in a lessformalsense, for instance by usinggraphicaldisplays ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Data validation - GSDEMs • Among the GSDEMs different function categories there is ‘Review of data validity’ that is Functions that check the validity of data values against a specified range or a set of values and also the validity of specified combinations of values. Each check leads to a binary value (TRUE, FALSE) ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Data Validation - GSBPM ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Data validation life cycle ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Second part of the document: Metrics • Evaluating validation procedure • …next presentation… ValiDat foundation workshop - Wiesbaden 10-11 November 2015
Thanks for your attention ValiDat foundation workshop - Wiesbaden 10-11 November 2015