1 / 18

Managing Information Quality in e-Science using Semantic Web technology

www.qurator.org Describing the Quality of Curated e-Science Information Resources. Managing Information Quality in e-Science using Semantic Web technology. Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University of Aberdeen

ardice
Télécharger la présentation

Managing Information Quality in e-Science using Semantic Web technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.qurator.org Describing the Quality of Curated e-Science Information Resources Managing Information Quality in e-Scienceusing Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University of Aberdeen Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science, University of Manchester David Stead, Al Brown Molecular and Cell Biology, University of Aberdeen

  2. Information and quality in e-science Public BioDBs Lab experiment E-science experiment In silico experiments (eg Workflow-based) How can I decide whether I can trust this data? • Variations in the quality of the data • No control over the quality of public data • Difficult to measure and assess quality - No standards • Scientists required to place their data in the public domain • Scientists use other scientists' experimental results as part of their own work

  3. A concrete scenario Qualitative proteomics: identification of proteins in a cell sample Wet lab Step n Step 1 Information service (“Dry lab”) Hit list: {ID, Hit Ratio, Mass Coverage,…} Candidate Data for matching (peptides peak lists) Match algorithm Reference DBs - MSDB - NCBI - SwissProt/Uniprot False negatives: incompleteness of reference DBs, pessimistic matching False positives: optimistic matching

  4. Quality is personal Scientists tend to express their quality requirements for data by giving acceptability criteria These are personal and vary with the expected use of the data “What is the right trade-off between false positives and false negatives?”

  5. Requirements for IQ ontology • Establish a common vocabulary • Let scientists express quality concepts and criteria in a controlled way • Within homogeneous scientific communities • Enable navigation and discovery of existing IQ concepts • Sharing and reuse: let users contribute to the ontology while ensuring consistency • Achieve cost reduction • Making IQ computable in practice • Automatically apply acceptability criteria to the data

  6. Quality Indicators Information service (“Dry lab”) Quality Indicators: measurable quantities that can be used to define acceptability criteria: • “Hit Ratio”, “Mass Coverage”, “ELDP” • provided by the matching algorithm Match algorithm Hit list: {proteinID Hit Ratio, Mass Coverage,…} Experimentally established correlation between these indicators and the probability of mismatch

  7. Data acceptability criteria • Indicators used as indirect “clues” to assess quality • Quality Assertions (QA) formally capture these clues as functions of indicators • Data classification or ranking functions: ex: PIClassifier defined as f(proteinID, Hit Ratio, Mass Coverage, ELDP)  { (proteinID, rank) } • This provides a custom ranking of the match results • Formalized acceptability criteriaare conditions on QAs accept(proteinID) if PIClassifier(ProteinID,…) > X OR …

  8. IQ ontology backbone Class restriction: MassCoverage   is-evidence-for . ImprintHitEntry Class restriction: PIScoreClassifier   assertion-based-on-evidence . HitScore PIScoreClassifier   assertion-based-on-evidence . Mass Coverage assertion-based-on-evidence: QualityAssertion  QualityEvidence is-evidence-for: QualityEvidence  DataEntity

  9. Quality properties Consistency Conformity Completeness Conciseness Timeliness Currency Accuracy Users may add to a collection of generic quality properties ? Generic quality properties Part of the backbone PI-acceptability User-defined Quality property How do we ensure consistent specialization?

  10. Specializations of base ontology concepts Hit Ratio Accuracy of Protein identification Protein Hit Abstract assertion (informal): “a Quality Property is based upon one or more Quality Indicators for a Data Entity ” Concrete assertion (informal): “the property Accuracy of Protein Identification is based upon the Hit Ratio indicator for Protein Hit data” Data Entity Quality Indicator Quality Property … … Proteomics … Accuracy Property Protein identification

  11. Maintaining consistency by reasoning Is a • Axiomatic definition for Accuracy: ( QtyProperty-from-QtyAssertion . ( QA-based-on-evidence . ConfidenceEvidence)) PI-acceptability Accuracy QtyProperty-from-QtyAssertion  PI-TopK Confidence Characterization Pref-based-on-evidence PMF-Match Ranking Has-quality characterization Mass Coverage Output-of PIMatch Based-on Hit Ratio

  12. Computing quality in practice • Annotation model: Representation of indicator values as semantic annotations: • model: RDF schema • annotation instances: RDF metadata • Binding model: Representation of the mapping between • Data ontology classes  data resources • Functions ontology classes  service resources Goal: to make quality assertions defined in the ontology computable in practice

  13. Data resource annotations Resource = Data items at various granularity Data item  indicator values

  14. Data resource bindings Data class  data resource • Account for different granularities, data types

  15. Service resource bindings • Function class  (Web) service implementation • Eg annotation function, QA function

  16. The complete quality model

  17. IQ Service Example

  18. Summary • An extensible OWL DL ontology for Information Quality • Consistency maintained using DL reasoning • Used by e-scientists to share and reuse: • Quality indicators and metrics • Formal criteria for data acceptability • Annotation model: generic schema for associating quality metadata to data resources • Binding model: generic schema for mapping ontology concepts to (data, service) resources • Model tested on data for proteomics experiments

More Related