1 / 20

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either). Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us. Assessment vs. Measurement. Measurement More objective

ivan
Télécharger la présentation

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either) Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us

  2. Assessment vs. Measurement • Measurement • More objective • From inside the data • Uses a metric • Assessment • May have more subjective parts • From the outside • Entire process, of which measurement is one part • Measurement is part of the assessment-output

  3. IQ-Assessment is difficult • IQ-criteria are often of subjective nature • Sources do not publish useful IQ-metadata. • Sources take measures to hinder IQ-assessment. • Enormous amount of data - Sampling • Subject to changes in content and quality

  4. Architectural levels relevant to assessment • Sources • Wrapper • Mediated Schema • Mappings • Query decomposition • Result composition (process) • Integrated result at user/app Assumption of soundness and completeness

  5. DQ requirem. DQ requirement Wrapper 1 Wrapper 2 Wrapper 3 The big picture Application 1 User 2 User profiles Private DQ Interpretation DQ Interpretation DQ Interpretation Also subjective Mediator + DQ vectors DQ feedback Data Acquisition / Workflow DQ Assessment objective X Source 1 Source 2 Source 3

  6. Assessment independence • In analogy to data independence • Different apps have different interpretations of DQ • Separation of • application-independent assessment • User/App-dependent interpretation

  7. Data source Past results (Traces) Includes data & metadata The data itself Granularity Subsets / partitions Integration result Mapping Transformation Aggregation Data-oriented (objective in nature?) Offline Defaults We need some Ground estimations, Which criteria need them? Size of the world for completeness Refinement DQ assessment

  8. DQ Interpretation • User or Application • Feedback • User profiles • Quality-requirements • Hard: Company-specification; ISO • Soft: User requirements • Query / Usage • Quality-requirements • Does it select certain subsets of sources • Online • User/App-oriented (subjective in nature?) • Defaults • We definitely need them as a guide and for initialization (bootstrapping) • Refinement

  9. Private DQ Interpretation • Performed User • Results in certain actions • Exclude a source • Invest more Money/Time • Rewrite a query • Give up • Change parameters • Search for new sources

  10. Subject-criteria believability concise representation interpretability relevancy reputation underst. value-added Object-criteria accuracy completeness cust. support document. objectivity price reliability security timeliness verifiability OLD SLIDE:An assessment-oriented classification • Process-criteria • amount • availability • consistent representation • latency • response time

  11. Online; user-centric; subjective Subject-criteria believability concise representation interpretability relevancy reputation underst. value-added Offline; data-centric; objective Object-criteria accuracy completeness cust. support document. objectivity price reliability security timeliness verifiability OLD SLIDE:How does it fit? Not really… • Offline; data-centric; objective • Process-criteria • amount • availability • consistent representation • latency • response time

  12. Output of Assessment and Interpretation • Numbers (DQ values) • in Vectors • Single Values • Rankings • Units / precision • Categories • Good/bad. etc. • Explanations • Trace

  13. Doubts • Can everything app-specific be done during interpretation? • In other words: Is a single assessment enough? • If not: Is a simple parameterization enough? • Is MS really doing a job in automatically naming files?

  14. Comparison: GeneDB (not the same as GeneDB.org) vs. DBLP • Dimensions of Comparison • Input to DQ Assessment • DQ Criteria • DQ Interpretation • Requirements • User • App

  15. Willingness and ability of giving input GeneDB: Ability is often lacking E.g. Schema Evolvability Accuracy Noisy data intrinsic to GeneDB Up-to-Dateness Scientific DB: Announcements DBLP: Unknown (less in summer) Trust and Reputation Already here? Completeness Mostly willing Identification of domain hinders assessment Duplicates GeneDB unable to assess (or define) DBLP: Mentions scripts Obvious Ontology Availability Response Time Input to DQ Assessment (Comparison)

  16. Criteria for GeneDB Reputation/Trust/Believability Schema evolvability Ontologies Up-to-Dateness (1week) Lineage Criteria for DBLP Response Time Understandability Completeness Schema and data stability DQ criteria • Criteria for neither • Up-to-Dateness (1day,1sec) • Availability • Criteria for both • Completeness • Accuracy • Duplicates

  17. GeneDB Trust is important (Oops) Usage as source Usage within a workflow Hard requirements Costs moneys Costs lives Default DQ requirements DBLP Usage as a tool, not as a source Thus, hardly DQ requirements Costs rejection of paper Default DQ requirements / assumptions DQ Interpretation (Comparison)

  18. GeneFlow.com TM High Quality Integrated Geneomic Data into your face!

  19. Availability: Percentage of time an information source is “up”. Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source. Amount of data: Size of result. Believability: Degree to which the information is accepted as correct. Completeness: Quotient of the number of response items and the number of real world items. Concise representation: Degree to which the structure of the information matches the information itself. Consistent representation: Degree to which the structure of the information conforms to that of other sources. Customer support: Amount and usefulness of online support through text, email, phone etc. Documentation: Amount and usefulness of documents with meta information. Interpretability: Degree to which the information conforms to technical ability of the consumer. Latency: Amount of time until first information reaches user. IQ-Criteria

  20. Objectivity: Degree to which information is unbiased and impartial. Price: Monetary charge per query. Relevancy: Degree to which information satisfies the users need. Reliability: Degree to which the user can trust the information. Reputation: Degree to which the information or its source is in high standing. Response time: Amount of time until complete response reaches the user. Security: Degree to which information is passed privately from user to information source and back. Timeliness: Age of information. Understandability: Degree to which the information can be comprehended by the user. Value-added: Amount of benefit the use of the information provides. Verifiability: Degree and ease with which the information can be checked for correctness. IQ-Criteria

More Related