DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either) Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us

Assessment vs. Measurement • Measurement • More objective • From inside the data • Uses a metric • Assessment • May have more subjective parts • From the outside • Entire process, of which measurement is one part • Measurement is part of the assessment-output

IQ-Assessment is difficult • IQ-criteria are often of subjective nature • Sources do not publish useful IQ-metadata. • Sources take measures to hinder IQ-assessment. • Enormous amount of data - Sampling • Subject to changes in content and quality

Architectural levels relevant to assessment • Sources • Wrapper • Mediated Schema • Mappings • Query decomposition • Result composition (process) • Integrated result at user/app Assumption of soundness and completeness

DQ requirem. DQ requirement Wrapper 1 Wrapper 2 Wrapper 3 The big picture Application 1 User 2 User profiles Private DQ Interpretation DQ Interpretation DQ Interpretation Also subjective Mediator + DQ vectors DQ feedback Data Acquisition / Workflow DQ Assessment objective X Source 1 Source 2 Source 3

Assessment independence • In analogy to data independence • Different apps have different interpretations of DQ • Separation of • application-independent assessment • User/App-dependent interpretation

Data source Past results (Traces) Includes data & metadata The data itself Granularity Subsets / partitions Integration result Mapping Transformation Aggregation Data-oriented (objective in nature?) Offline Defaults We need some Ground estimations, Which criteria need them? Size of the world for completeness Refinement DQ assessment

DQ Interpretation • User or Application • Feedback • User profiles • Quality-requirements • Hard: Company-specification; ISO • Soft: User requirements • Query / Usage • Quality-requirements • Does it select certain subsets of sources • Online • User/App-oriented (subjective in nature?) • Defaults • We definitely need them as a guide and for initialization (bootstrapping) • Refinement

Private DQ Interpretation • Performed User • Results in certain actions • Exclude a source • Invest more Money/Time • Rewrite a query • Give up • Change parameters • Search for new sources

Subject-criteria believability concise representation interpretability relevancy reputation underst. value-added Object-criteria accuracy completeness cust. support document. objectivity price reliability security timeliness verifiability OLD SLIDE:An assessment-oriented classification • Process-criteria • amount • availability • consistent representation • latency • response time

Online; user-centric; subjective Subject-criteria believability concise representation interpretability relevancy reputation underst. value-added Offline; data-centric; objective Object-criteria accuracy completeness cust. support document. objectivity price reliability security timeliness verifiability OLD SLIDE:How does it fit? Not really… • Offline; data-centric; objective • Process-criteria • amount • availability • consistent representation • latency • response time

Output of Assessment and Interpretation • Numbers (DQ values) • in Vectors • Single Values • Rankings • Units / precision • Categories • Good/bad. etc. • Explanations • Trace

Doubts • Can everything app-specific be done during interpretation? • In other words: Is a single assessment enough? • If not: Is a simple parameterization enough? • Is MS really doing a job in automatically naming files?

Comparison: GeneDB (not the same as GeneDB.org) vs. DBLP • Dimensions of Comparison • Input to DQ Assessment • DQ Criteria • DQ Interpretation • Requirements • User • App

Willingness and ability of giving input GeneDB: Ability is often lacking E.g. Schema Evolvability Accuracy Noisy data intrinsic to GeneDB Up-to-Dateness Scientific DB: Announcements DBLP: Unknown (less in summer) Trust and Reputation Already here? Completeness Mostly willing Identification of domain hinders assessment Duplicates GeneDB unable to assess (or define) DBLP: Mentions scripts Obvious Ontology Availability Response Time Input to DQ Assessment (Comparison)

Criteria for GeneDB Reputation/Trust/Believability Schema evolvability Ontologies Up-to-Dateness (1week) Lineage Criteria for DBLP Response Time Understandability Completeness Schema and data stability DQ criteria • Criteria for neither • Up-to-Dateness (1day,1sec) • Availability • Criteria for both • Completeness • Accuracy • Duplicates

GeneDB Trust is important (Oops) Usage as source Usage within a workflow Hard requirements Costs moneys Costs lives Default DQ requirements DBLP Usage as a tool, not as a source Thus, hardly DQ requirements Costs rejection of paper Default DQ requirements / assumptions DQ Interpretation (Comparison)

GeneFlow.com TM High Quality Integrated Geneomic Data into your face!

Availability: Percentage of time an information source is “up”. Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source. Amount of data: Size of result. Believability: Degree to which the information is accepted as correct. Completeness: Quotient of the number of response items and the number of real world items. Concise representation: Degree to which the structure of the information matches the information itself. Consistent representation: Degree to which the structure of the information conforms to that of other sources. Customer support: Amount and usefulness of online support through text, email, phone etc. Documentation: Amount and usefulness of documents with meta information. Interpretability: Degree to which the information conforms to technical ability of the consumer. Latency: Amount of time until first information reaches user. IQ-Criteria

Objectivity: Degree to which information is unbiased and impartial. Price: Monetary charge per query. Relevancy: Degree to which information satisfies the users need. Reliability: Degree to which the user can trust the information. Reputation: Degree to which the information or its source is in high standing. Response time: Amount of time until complete response reaches the user. Security: Degree to which information is passed privately from user to information source and back. Timeliness: Age of information. Understandability: Degree to which the information can be comprehended by the user. Value-added: Amount of benefit the use of the information provides. Verifiability: Degree and ease with which the information can be checked for correctness. IQ-Criteria

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

Presentation Transcript

Extending Trace Models

Trust on the Semantic Web

creating a culture of trust

HAVE YOU FORGOTTEN SOMETHING?

LAWYER TRUST ACCOUNTS IN IOWA Frequently Asked Questions

To Trust of Not To Trust? Predicting Online Trusts using Trust Antecedent Framework

Those Who Trust In The Lord

The FIVE Dysfunctions of a T eam

Trust Bank Accounts

What Gives? A Hybrid Algorithm for Error Trace Explanation

THE PAIN OF SIN

Interpersonal Communication: Relationship

Those Who Trust

Marine Invertebrate Paleo

First step into Trace Analysis

DO NOT TRUST IN UNCERTAIN RICHES

Professional Reliance in the Williams Lake TSA

Chapter 12

Some May Trust in Chariots

Professional Reliance in the Williams Lake TSA

Trust

DO NOT TRUST IN UNCERTAIN RICHES