450 likes | 638 Vues
Measuring and Communicating Data Quality. UNECE Training Workshop on Dissemination of MDG Indicators and Statistical Information Astana, Kazakhstan 23 – 25 November 2009 Steven Vale, UNECE. Contents. What is quality? How can we measure quality? How should we report and communicate quality?.
E N D
Measuring and Communicating Data Quality UNECE Training Workshop on Dissemination ofMDG Indicators and Statistical Information Astana, Kazakhstan 23 – 25 November 2009 Steven Vale, UNECE
Contents • What is quality? • How can we measure quality? • How should we report and communicate quality?
Definition of Quality International StandardISO 9000/2005 defines quality as; 'The degree to which a set of inherent characteristics fulfils requirements.’
What Does This Mean? • Whose requirements? • The user of the goods or services • A set of inherent characteristics? • Users judge quality against a set of criteria reflecting the different characteristics of the goods or services • So quality is all about providing goods and services that meet the needs of users (customers)
Quality Criteria for Statistics • Different statistical organisations use different criteria- but lists of criteria are quite similar • UNECE list:Relevance Comparability Accuracy Clarity Timeliness Accessibility Punctuality
Relevance • Are the statistics that are produced needed? • Are the statistics that are needed produced? • Do the concepts, definitions and classifications meet user needs?
Accuracy • The closeness of statistical estimates to true values • In the past: Quality = Accuracy • Now accuracy is just one part of quality
Timeliness • The length of time between data being made available and the event or phenomenon they describe Punctuality • The time lag between the actual delivery date and the promised delivery date
Comparability • The extent to which differences are real, or due to methodological or measurement differences • Comparability over time • Comparability through space (e.g. between countries / regions) • Comparability between statistical domains (sometimes referred to as coherence)
Accessibility • The ways in which users can obtain or benefit from statistical services (pricing, format, location, language etc.) Clarity • The availability of additional material (e.g. metadata, charts etc.) to allow users to understand outputs better
Importance of Accessibility • Not just about making data available on the Internet or in a book • Passive accessibility • Accessibility is about bringing data to users in an understandable way, opening a dialogue with those users, and ensuring that their information needs are met • Active accessibility
Accessibility Should Include: • Communicating • Marketing • Interpreting • “Story-telling” • Informing • Educating
Accessibility and Visualization • Good visualizations make data accessible to many more users • Bad visualizations are unhelpful / misleading • “Self-service” visualization needs to be simple, with guidance to help users get meaningful results • “Ready-made” visualizations can be more complex, tailored to specific data sets
Accessibility and Visualization • Is it more cost-effective to: • develop “ready-made” graphics, or • offer users more “self-service” functionality? • Many users don’t have the time or knowledge to produce good visualizations • Advanced users have access to their own visualization and analysis tools
Importance of Clarity • Clarity is all about explaining data • Do current explanatory notes help? • Often written by specialists for specialists • Full of jargon • Too long • Too boring! • Simplified, plain-text versions needed
Other Considerations • Cost / efficiency • Integrity / trust • Reputation of the organization • Professionalism • Adherence to international standards (e.g. UN Fundamental Principles of Official Statistics)
Quality is not just about outputs To have good outputs we need to have good inputs and processes, so we need to think about the quality of these as well Input Process Output
Quality of Inputs • Timeliness • Completeness – are there any missing units or variables? • Comparability with other sources • Quality check survey? • Knowledge of the source is vital!
Quality of Processing • Quality of matching / linking • Outlier detection and treatment • Quality of data editing • Quality of imputation • Keep raw data / metadata to refer back to if necessary
Quality of Outputs • Are the users satisfied? • Are the outputs comparable with data from other sources? • What is the impact on time series? • Are the outputs cost-effective? • Quality reports to measure and communicate differences?
Measuring Quality • Quantitative methods • E.g. confidence intervals • User surveys • Self evaluation • Benchmarking
Quantitative Measures The tops of the bars indicate estimated values and the red lines represent the confidence intervals surrounding them.
UNECE Database User Survey • Launched each autumn on database web site • 10 questions • 150 responses(target 100)
Exercise • Design a user survey with up to 10 questions for users of your web site • 20 minutes
UNECE User Survey Questions 1. Type of user 2. Frequency of use 3. Location (country) 4. Type of data 5. Database relevance 6. Timeliness
Continued... 7. Clarity (metadata) 8. Overall data quality 9. User interface 10. Other comments and questions
Improving Our Services • Better timeliness of data • New “Country Overview” data cube to give quick access to key indicators • More content in Russian • Improved user interface • More and better metadata • Statistical literacy
Self-evaluation • Relatively quick and cheap • Is it sufficiently objective? • Needs a standard framework to ensure comparability of quality assessments • Eurostat DESAP check list:http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/documents/desap%20G0-LEG-20031010-EN.pdf
Benchmarking • Comparing data values or data production processes between two sources • Differences can be studied to try to find ways to improve quality
Benchmarking Between Countries Fairly cheap and easy way to get ideas on how to improve statistical processes Mutual benefit - “win - win” Helps to improve international cooperation May lead to joint development projects
Communicating Quality • Quality Reports • Summary – “traffic light” indicator • Red – Serious quality issues, read thequality report before using • Orange – Caution, do not use for important decisions without reading the quality report • Green – Good quality • Intermediate – short quality report(1000 words maximum) • Detailed – full quality report
Detailed Quality Reports • Should cover all components of quality • Should be written for the user • Should be easily accessible • Should follow a standard template
Exercise What should be covered in a detailed quality report? List the topics that should be included 10 minutes
ESQR Contents (1) • Introduction to the statistical process and its outputs • Relevance • Accuracy • Timeliness • Punctuality • Accessibility • Clarity
ESQR Contents (2) • Comparability • Trade-offs between quality components • Assessment of User Needs and Perceptions • Performance, Cost and Respondent Burden • Confidentiality, Transparency and Security • Conclusion
Summary • Quality is all about meeting user needs • There are many different aspects to quality, some of which may be in conflict • E.g. Timeliness versus Accuracy • There are various ways of measuring quality; user views are important • Quality should be communicated to users in a way they can understand
Which is the Best Quality? It depends what the user needs!