1 / 16

Query- driven Data Completeness Management

Query- driven Data Completeness Management. Simon Razniewski Supervised by Werner Nutt. Area: Data Quality/Decision Support. Data Quality research investigates how good data is Dimensions of Data Quality are: Correctness Timeliness Completeness.

tevy
Télécharger la présentation

Query- driven Data Completeness Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query-driven Data Completeness Management Simon Razniewski Supervisedby Werner Nutt

  2. Area: Data Quality/Decision Support • Data Quality research investigates how good data is • Dimensions of Data Quality are: • Correctness • Timeliness • Completeness Query-driven Data Completeness Management

  3. Example Scenario: School Data Management in South Tyrol Central school database Statistical reports ?? Notoriously incomplete Completeness important Query-driven Data Completeness Management

  4. Example: Final Grades • Vocational schools enter final grades, many others don‘t • Query: How many pupils in class 12 have ‘A‘ in Math? • Answer: 3400 • Can we trust this? • Pupils from high schools could be missing in the result No! Query-driven Data Completeness Management

  5. Example: Final Grades (2) • Vocationalschoolsenter final grades, manyothersdon‘t • Query: Howmanypupilsatvocationalschoolsin class 12 have ‘A‘ in Math? • Answer: 1700 • Can wetrustthis? • All grades fromvocationalschoolsare in thedatabase Yes! Query-driven Data Completeness Management

  6. Research Questions • How can completeness information be stored in a database? • How can one find out whether query answers are complete (and correct)? • Where can completeness information come from? Query-driven Data Completeness Management

  7. Where Else is Incompleteness a Problem? • … whenmanyusercontributeto a database Openstreetmap, Wikipedia (?) • … whendatasubmissionis optional Data fromsurveys • … whendatafrom different sourcesisintegrated Biological databases • … whenthe real worldchangeswithoutinformingthedatabase Addressdata Query-driven Data Completeness Management

  8. Abstract Problem Database, where in some parts data may be missing (grey) Query to the database • Can we trust the query answer? • Does the query only touch the complete (green) parts of the database? • If not, where does it touch grey parts? • How could we modify the query to touch only green parts? Query-driven Data Completeness Management

  9. Approach This dataiscomplete I querythispartofthedatabase. Can I trusttheanswer? Database …andthis Youcannot, because… But youcould.. Andthis! 1 Formalismforstatementsaboutcompleteness 4 Derivation of statements from business process analysis 2 Reasoning procedures 3 Implementation techniques Query-driven Data Completeness Management

  10. Approach: Describe Complete Parts by Queries pupil(name, class, school_name, school_type) grade(name, subject, value) • Complete: All grades ofpupilsfromvocationalschools: QCgrades(n,s,v) :- grade(n,s,v), pupil(n,c,sn,‘vocational‘) • Complete: All pupils QCpupils (n,c,sn,st) :- pupil(n,c,sn,st) Query-driven Data Completeness Management

  11. Approach: Compare Queries with Complete Queries Query: All pupils in class 12 with ‘A‘ in Math Qgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st) QCgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st) pupil(n,c,sn,‘vocational‘) Reasoning: Are Qgoodmath and Qcgoodmathequivalent? Query-driven Data Completeness Management

  12. Done so Far • Problem 1: Statements about database completeness and query completeness [LID2011] • Problem 2: Reasoning tasks + algorithms + complexity [VLDB2011] • Problem 3: Map reasoning tasks to satisfaction modulo theories (SMT) Query-driven Data Completeness Management

  13. Challenge • Identify core problems and doable steps • Schema constraints • Nulls • Business Processes • Probabilistic Reasoning • XML-databases Query-driven Data Completeness Management

  14. Publications • Checking Query CompletenessoverIncomplete Databases, Simon Razniewskiand Werner Nutt, Workshop on Logic in Databases, 2011 • CompletenessofQueriesoverIncomplete Databases, Simon Razniewskiand Werner Nutt, International Conference on Very Large Databases, 2011 • Submitted:Incomplete Databases: Missing Records andMissing Values, Werner Nutt, Simon Razniewskiand Gil Vegliach, Workshop on Data Quality in Data Integration Systems, 2012 Query-driven Data Completeness Management

  15. Thank you. Query-driven Data Completeness Management

  16. Related Work • Motro 1989: Introduced important concepts for describing database completeness • Levy 1996: Introduced a central reasoning problem about data completeness • Fan, Geerts 2009: Worked on a similar problem of data completeness for master data Query-driven Data Completeness Management

More Related