1 / 33

Biomedical Informatics Group (UPM)

Biomedical Informatics Group (UPM). WP4: Data Interoperability and Management. Presentation: UPM. List of Contents. Deliverable D25: “First report on Data Interoperability and Management” Recent results Plans for future until the end of the NoE. Contents of Deliverable 25. Introduction

syshe
Télécharger la présentation

Biomedical Informatics Group (UPM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biomedical Informatics Group(UPM) WP4: Data Interoperability and Management Presentation: UPM

  2. List of Contents • Deliverable D25: “First report on Data Interoperability and Management” • Recent results • Plans for future until the end of the NoE

  3. Contents of Deliverable 25 • Introduction • Integrating Approaches • OntoFusion Plus • OntoDataClean • VIH databases integration (Informa) • PML • Ethics and Confidentiality Issues • References

  4. Original OntoFusion • Ontology-based Data Integration system • Schema-level integration • Each information source is associated to a “Virtual Schema” (Ontology which represents its conceptual structure) • “Virtual Schemas” are obtained after a mapping process between physical structures and a domain ontology • An automatic unification process allows unifying several “Virtual Schemas” • Ontologies are represented in DAML+OIL

  5. OntoFusion Plus • New version developed for INFOBIOMED • Agent-based architecture has been migrated to a Web Service oriented architecture • The knowledge representation language has been updated to OWL (Web Ontology Language)

  6. Search Application Server Registry Registry Web Client Search JADE Agent Platform Request Registry Registry Search BDV BDV JDBC JDBC Results BD BD Search Original OntoFusion approach Agent-based architecture

  7. OntoFusion Plus architecture

  8. OntoDataClean • Instance level integration • Support to KDD processes, focused on automatic preprocessing of data, previous to data mining algorithms • Use of an ontology to eliminate or solve data inconsistencies • Terminology • Scale • Range • Format • Missing values • …

  9. Preprocessing Transformation Data Mining Interpretation Knowledge Patterns Selection Transformed data Preprocessed data KDD y Ontologías Biomedical ontologies Methodological ontologies Data Warehouse Target data

  10. OntoDataClean OntoFusion & OntoDataClean Web Services Platform VS Service Web Client Web Server HTTP VS Service User Service Results VS Service

  11. OntoDataClean Order Source DB Fields Data Source Cleaning Model URL Pattern Missing Values Duplicate Format Scale Rule Synonym Data Type Expression String Synonym DB Values Regular Expression Replacement URL Name Condition Detection Transformation Preferred Name Average Column Replacement Condition Representative Values Missing Value Ranges Most Frequently Replacement Representative Values Condition Value Ranges Row Removal String Replacement Fig. 2. OntoDataClean Preprocessing Ontology

  12. OntoDataClean • Experiments carried out with three different public databases, selected because their contents can be downloaded to a local machine: • Reactome • Gepas – Fibroblast • BioMérieux

  13. OntoDataClean Experiments (I) • BIOMERIEUX: Biochemical characterization of bacteriological agents • Pattern transformation http://www.biomerieux.com

  14. Preprocessing ontology for BioMérieux (biochemical profiles) experiments, implemented using Protégé

  15. An example of BioMérieux data transformation using OntoDataClean • Id – Test identifier • Results – Biochemical profiles using binary codification • Id’ – Test identifier • Results’ – Biochemical profiles using decimal codification

  16. OntoDataClean Experiments (II) • REACTOME: A knowledge base of biological pathways • Terminological inconsistencies (resolved using the UMLS) and missing values, complex pattern modifications on string data concerning urls, erasure of duplicate values, synonym substitutions and missing values transformations http://www.reactome.org

  17. OntoDataClean Experiments (III) • The Gene Expression Pattern Analysis Suite • Integrated web-based pipeline for the analysis of gene expression patterns • Scale Transformations http://www.gepas.org

  18. OntoDataClean Paper accepted and presented on the VII International Symposium on Biological and Medical Data Analysis (ISBMDA 2006) in Thessaloniki (Greece) on December 7th-8th, 2006

  19. 1 paper presented • A plenary session dedicated to INFOBIOMED, with emphasis on technological tools (WP4 and WP5)

  20. Just Published: Alonso-Calvo R, Maojo V, Billhardt H, Martin-Sanchez F, Garcia-Remesal M, Perez-Rey D. An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform. 2007 Feb;40(1):17-29.

  21. Current efforts at UPM • To automate the mapping process • To integrate OntoFusion Plus and OntoDataClean • A new interface for OntoFusion. 2 approaches: • Search based on free text • To integrate the Mobility Brokerage Service with OntoFusion • Additional work, based on Grid Services, carried out for the ACGT project (Grid services orchestration, implementation of a new Cancer ontology, ontology-based information retrieval, a semantic mediator to design workflows using different bioinformatics tools)

  22. Current Pending VIH distributed databases • Proposal for a demo oriented to the integration of viral genomics with clinical data in VIH infections • Ontology-based integration of databases

  23. Sent on November 2nd, 2006

  24. VIH proposal • Use OntoFusion to integrate the databases • Trying to establish mappings among databases involved, using ARCA database as reference • Pending feedback about: • Final structures to be integrated if changes are made according to suggestions proposed • Type of access to databases • Kind of results expected from the integrated databases

  25. June. 26th Meeting in Madrid 01/10/2006 – 01/11/2006 Discussion 07/07/2006 – 01/10/2006 Analysis of DBs 01/11/2006 – 01/03/2007 01/03/2007 – 30/06/2007 Original Planning for VIH Mini-Pilot Implementation Testing

  26. PML (Anthony Brookes) • Polymorphism Markup Language • Definition of a data model for phenotype data, based on ‘Entity-Attribute-Value’ triplet concept

  27. PML (Anthony Brookes) • Work on other two database projects: • GenoScore. A self-contained database application for storing genotype data and clinical material details as pertinent to the activities of typical laboratories involved in genetic association studies. • Human Genome Variation Database – Genotype-to-Phenotype (HGVbase-G2P). A public ‘central database’ for genetic association data (summary level information) generated by the community.

  28. Ethics and Confidentiality Issues (Custodix) • Overview and Analysis of Existing Techniques • Encrypted Storage on Untrusted Servers • Related research topics • Commercial encrypted storage solutions • Privacy enhanced searching • The Privacy Enhanced Storage (PES) Framework • Scope • PES design considerations • PES framework • Implementation • Known limitations and future work

  29. End February - 1st week March Today M39 D25 Draft for Internal Review 24/01/07 1st Schema Draft sent Deadline for D25 Planning for Deliverable 25

More Related