1 / 8

Data Confidence Policies: Query Processing Techniques

This paper presents a systematic approach for data use based on confidence values, introducing confidence policies and algorithms to adjust data confidence levels efficiently. It addresses challenges in improving data quality while minimizing costs. The system framework includes associating confidence values with data tuples and computing results' confidence based on lineage. Performance studies show the effectiveness of the proposed system.

abel
Télécharger la présentation

Data Confidence Policies: Query Processing Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Processing Techniques for Compliance with Data Confidence Policies Chenyun Dai1, Dan Lin2, Murat Kantarcioglu3, Elisa Bertino1, EbruCelikel3, and Bhavani Thuraisingham3 1Department of Computer Science, Purdue University 2Department of Computer Science, Missouri University of Science and Technology 3Department of Computer Science, The University of Texas at Dallas

  2. Outline • Motivation • Policy Compliant Query Evaluation • Related Work • Algorithms • Performance Study • Conclusion and Future Work

  3. Motivation • Improving data quality incurs costs • Verify a customer address • Verify the financial status • Different types of medical data • Obtaining accurate data are expensive • Data quality depends on the purpose • Not critical: statistical summery • Critical: investment, evaluating effectiveness of treatment

  4. Challenges • How to specify which task requires high-confidence data? • How can we improve the confidence of the data to desired level with minimum cost? • Which portion of the data should be selected for quality improvement?

  5. System Framework • Four components • (1) Assocate confidence values with data tuples [SDM’08] • (2) results’ confidence computation based on lineage [VLDB’04] • (3) confidence policy* • (4) finding optimal strategy for increasing confidence level* • * proposed in this paper

  6. Contributions • Propose the first systematic approach to data use based on confidence values of data items • Introduce the notion of confidence policy and confidence policy compliant query evaluation • Propose three algorithms to minimize the cost for adjusting confidence values of data • Carried out performance studies which demonstrate our system is efficient

  7. Related Work • Access Control Policies • RBAC • Lineage calculation • Trio[VLDB’06] • Provenance in e-science[SIGMOD Record’05] • Probabilistic data[TKDE’92] • Quality view[VLDB’06] • Specify users’ quality requirements using views • Does not include a quality increment component

  8. Policy Compliant Query Evaluation

More Related