180 likes | 522 Vues
On the Authoritative Data Sources: One Data Element at a Time. DAMA National Capital Region Chapter Meeting March 9, 2010 Washington, DC. Richard Wang, Ph.D. Deputy Chief Data Officer Chief Data Quality Officer Office of the U.S. Army CIO/G-6
E N D
On the Authoritative Data Sources: One Data Element at a Time DAMA National Capital Region Chapter Meeting March 9, 2010 Washington, DC Richard Wang, Ph.D. Deputy Chief Data Officer Chief Data Quality Officer Office of the U.S. Army CIO/G-6 Director, MIT Information Quality Program (on leave) Massachusetts Institute of Technology Vesting University Professor of Information Quality University of Arkansas at Little Rock
Data Quality Books by MIT Information Quality Program • http://mitiq.mit.edu/Publications.htm 2006 2005 2000 1999 2006
MIT’s role in the foundations for IQ education (2007, Madnick) Lots of time & energy UALR: MSIQ and IQ PhD Degree Programs Journals - 2007 ACM Journal on Data and Information Quality (JDIQ) Conferences and Certification Programs • 1996 International Conference on Information Quality (ICIQ) • 2002 MIT-IQ program for Executives • 2003 IQ-1: Principles and Foundations • 2007 IQ Industry Symposium IQ Rich Wang (our Harry Potter) Books • Journey to Data Quality (2006) • … and many others Articles • 1990 Polygen Data Quality Model (VLDB + ICIS) • 1996 Beyond Accuracy • 1998 Managing Information as a Product IQ Research Projects • 1988 Total Data Quality Management Program (TDQM) • 2002 MIT Information Quality (MITIQ) Program * Not complete list
One Data Element At a Time:Federal Agency Case Stakeholders Meeting Data Element Identification $1M+ impact per data element 90-day progress
Private Sector Case Data Element Selection Criteria • Critical to Business • Recognized Pain Point • $1M+ impact • Practical to model • Practical to Implement • Owner identified • Commitment by the Stakeholders: 3 C’s + Management
Army Chief Data Quality Officer FY10 Priorities 300-500 critical Army Data Elements in FY10, 5000 by FY13 Army Staffing of Data Elements from Bronze to be Silver, Gold Vertical integration up with semantics, business logic, objects (U-Core, C2-Core ontology) Authoritative Data SourcesDesignated Data SourcesAuthoritative Data Elements
Single Element Approach Challenge: Establish a Total Data Quality Management (TDQM) Program in the Army while utilizing limited resources TDQM Cycle Solution: • Address one data element at a time using priority data elements within priority projects. • Take a first few data elements through the entire TDQM cycle to educate and illustrate value. • Establish and populate a catalog of data element quality specifications (the “Define” of TDQM) containing priority data elements for broad use.
Early Success Project: Suicide Mitigation - NIMH Study feed Elements: UIC, SSN • Developed Data Quality Specification to define data quality rules. • Constructed Information Product Map (IP-Map) that shows the flow of the data element and its quality checks from data providers to NIMH Study consumer. • ADCF implemented quality checks and reported results. • Captured DQ Process metrics and DQ element metrics in a Dashboard. • Preparing DQ element metric details to feed back to data providers.
(Army) Data Element Yellow Pages A. Army Data Elements specifications are developed thru the Data Element Quality Definition Process and entered in the Data Element Yellow Pages B. IP Producers utilize the Data Element Yellow Pages to discover Data Element specifications and integrate them into their Information Products IP Producer Data Element Quality Definition Process IP http:architecture.army.mil/data/DEYP IP = Information Product C. IP Consumers access the Data Element Yellow Pages to find Data Element specifications for understanding and correctly using the data. IP IP Consumer
Data Element Yellow Pages Content Data Element Quality Specification: • Element Name • Definition • Data Quality Rules • Approval Level • Examples • Data Element Owner (Steward?) • Authoritative References • Usage Notes • more… Data Quality Rules: Supports “fit for use” Segmented into Three Levels • Container (conceptual format) • Content (correct in itself) • Context (correct in context) Approval Level: • Gold – ADB Approved • Silver – ADC Approved • Bronze – CDQO Approved
ADC Review and Comment Process (proposed) 1. Review DE Specifications with your SMEs Note: you will find some documents cover the entire project; others have only the definition and quality sections completed. Review the definition and quality sections. 2. Gather and submit your comments to CDQO All comments welcomed (positive, corrections, content, format, unaddressed). No comment [silence] is concurrence. Send your comments to CDQO Office. 3. Suspense Date: Week before next ADC Meetingfor readout at month ADC meeting.
ADS Defined Authoritative Data Source: A recognized or official data production source with a designated mission statement or source/product to publish reliable and accurate data for subsequent use by customers. An authoritative data source may be the functional combination of multiple, separate data sources.
To assure data quality… • A data source, is a mechanism through which the publication, storage, or retrieval of data is possible. Within the scope of the Information Technology domain, a data source is consists of digitized data, such as a database, a machine readable file, or a data stream. Data sources contain or provide information and fulfill specific data needs within an identified mission context. • A data element is an attribute in a database, a field in a machine readable file, or a basic unit in a data stream. • The association of a data need and a given mission characterizes a data source’s intended use. • A data source is referred to as a Designated Data Source if the mission and the needed data elements from the data source for this mission are clearly specified. • An authoritative body that has responsibility of fulfilling a particular data need attributes a data source as a designated data source. • A designated data source is referred to as an Authoritative Data Source if the underlying data of the data elements needed in the specified mission is certified as accurate, timely, and fit for subsequent use by data consumers.