240 likes | 373 Vues
This presentation by Peter Mork from the University of Washington delves into the infrastructure needed for peer-based knowledge sharing in bioinformatics. It explores the integration of systems, from data to knowledge, with a focus on metadata management and the transition from local to peer sharing. Key topics include declarative versus descriptive mappings, evaluation of configurations, and the necessity of comprehensive metadata. The presentation also outlines experimental setups, successful query results, and insights on effective mapping strategies to enhance collaboration and knowledge retrieval in the field.
E N D
Infrastructure for Peer-Based Knowledge Sharing Peter MorkUniversity of Washington, Seattle 21-Sep-14
Motivating Example Microarray Experiment Information from public databases ?? ICAT Experiment
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Overview of Integration Systems + Schema+ Mappings + Annotations Source API
OMIM HUGO Swiss- Prot GO Gene- Clinics Locus- Link Entrez GEO Mediated Schema Entity Sequenceable Entity Structured Vocabulary Experiment Phenotype Gene Nucleotide Sequence Microarray Experiment Protein
BioMediator Maintenance: Push, Limited Journal Pull Validation: Internal Creation: Human Phenotype Maintenance: Push, Yearly Expert Review Validation: External Creation: Human Maintenance: Push Validation: None Creation: Human, Algorithm OMIM Gene- Clinics Entrez
Demo • Start with 6 Proteins and 6 Sequences • Find simple correspondences • Find biologically relevant clusters
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Necessary Metadata • Class Hierarchy • Concepts (e.g., Protein, Gene) • Property Hierarchy • Relationships (e.g., codes-for, causes) • Mappings • Source schema Mediated schema • Mapping Annotations • Information about maintenance and authority
Schema 3 Entity Schema 1 Schema 2 Sequenceable Entity Structured Vocabulary Experiment Phenotype Gene Nucleotide Sequence Microarray Experiment Protein OMIM HUGO Swiss- Prot GO Gene- Clinics Locus- Link Entrez GEO
Centralized Metadata Mgmt Entity Gene- Clinics Sequenceable Entity Phenotype Gene OMIM Nucleotide Sequence Entrez Protein Locus- Link
Declarative Peer Metadata Mgmt GeneClinics: Phenotype Gene Protein OMIM: Record Q3 Q2 Gene Record Entrez: Protein Nucleotide Seq. LocusLink: Phenotype Gene Protein Equivalent Q1
Superset Descriptive Peer Metadata Mgmt OMIM_Record = Phenotype ⊔ Gene Domain(AssociatedWith) = NucleotideSequence ⊔ Gene ⊓
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Experimental Setup • Centralized BioMediator = Gold Standard • Mapping Languages • PPL: Declarative • OWL: Descriptive • Peer Architectures • Complete • Minimal
Outline • Integration Systems • From Data to Knowledge (Metadata) • Metadata Management • From Local to Peer • Evaluation • Declarative vs. Descriptive Mappings • Complete vs. Minimal Configurations • Conclusions
Conclusions • More sources accessible • More power per mapping • Additional ‘redundant’ mappings provide little benefit • Less work maintaining mappings • Hidden cost: Logical mappings harder to write correctly • May interact in unforeseen ways
Acknowledgements • Funding • NLM training grant T15LM07442 • NHGRI grant R01HG02288 • BioMediator Team • Advisors • Alon Halevy • Peter Tarczy-Hornoch • Wendy Kramer (grant administrator)