1 / 15

Proteome data integration characteristics and challenges

Proteome data integration characteristics and challenges.

Télécharger la présentation

Proteome data integration characteristics and challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteome data integrationcharacteristics and challenges K. Belhajjame1, R. Cote4, S.M. Embury1, H. Fan2, C. Goble1, H. Hermjakob, S.J. Hubbard1, D. Jones3, P. Jones4, N. Martin2, S. Oliver1, C. Orengo3, N.W. Paton1, M. Pentony3, A. Poulovassilis2, J. Siepen, R.D. Stevens1, C. Taylor4, L. Zamboulis2, and W. Zhu4 1University of Manchester 2Birkbeck College 3University College London 4European Bioinformatics Institute

  2. Outline • Experimental proteomics • ISPIDER architecture • Example use cases • Conclusion All Hands Meetings, 2005

  3. Separation 2D gel electrophoresis Protein digestion Enzymatic digestion Mass Spectrometry Maldi TOF Protein DB Identification Protein ID Experimental proteomics • An essential component for elucidation of the biological functions of proteins • The study of the set of proteins produced by an organism with the aim of understanding their behaviour under varying conditions All Hands Meetings, 2005

  4. Experimental proteomics • Development of new technologies for: • protein separation (2D-SDS-PAGE, HPLC, Capillary Electrophoresis) • mass spectrometry (Multi-Dimensional protein identification) • Availability of publicly accessible protein sequence databases • Proteomics databases (PedroDB, gpmDB, PepSeeker, Pride, …) Building experiments involving analysis services orchestration and data processing and integration All Hands Meetings, 2005

  5. Objectives of ISPIDER A Grid dedicated to the creation of bioinformatics experiments for proteomics • Develop, or make, existing Proteome databases and Grid-enabled services • Develop Middleware support for developing and executing new proteome analyses, based on distributed query processing and workflow technologies • Undertake proteomic studies that demonstrate the effectiveness of the resulting infrastructure All Hands Meetings, 2005

  6. Outline • Experimental proteomics • ISPIDER architecture • Example use cases • Conclusion and future directions All Hands Meetings, 2005

  7. + Phosph. Extensions Vanilla Query Client 2D Gel Visualisation Client PPI Validation + Analysis Client Protein ID Client + Aspergil. Extensions Web services Source Selection Services Data Cleaning Services Proteome Request Handler Proteomic Ontologies/ Vocabularies Instance Ident/Mapping Services myGrid Ontology Services myGrid DQP myGrid Workflows AutoMed WS WS WS WS WS WS WS WS WS WS PRIDE PF GS TR PS FA PPI Phos PID PEDRo ISPIDER Resources Existing Resources ISPIDER ISPIDER Proteomics Clients ISPIDER Proteomics Grid Infrastructure Existing E-Science Infrastructure Public Proteomics Resources KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package All Hands Meetings, 2005

  8. Outline • Experimental proteomics • ISPIDER architecture • Example use cases • Conclusion and future directions All Hands Meetings, 2005

  9. Value-added protein datasets • Motivation Protein identification experiments are usually used as input into further analysis processes. • Gathering evidence for a biological hypothesis • Suggesting new hypotheses • Objective Augment the identification results with additional information on the identified protein • Implementation Taverna workflow system All Hands Meetings, 2005

  10. Value-added protein datasets PepMapper Web Service Auxiliary Services GO Services All Hands Meetings, 2005

  11. Genome-focused protein identification • Motivation Currently, protein identification searches performed over large data sets. This means fewer false negatives, but false positives are also more likely. • Objective More focused and thus more efficient protein identification • Implementation Taverna workflow system DQP, a service-based query processor All Hands Meetings, 2005

  12. Genome-focused protein identification select p.Name, p.Seq from p in db_proteinSequences where p.OS='HomoSapiens'; PepMapper web service DQP Web Service GOA Web Service IPI All Hands Meetings, 2005

  13. Integrated access to proteome databases • Motivation Ability to analyse existing proteomics results en masse is limited, because of the heterogeneities between the schemas of the different databases • Objective Providing integrated access to proteome databases through a common schema • Implementation AutoMed, a framework for mapping heterogeneous schemata DQP, a service-based query processor All Hands Meetings, 2005

  14. gpmDB PedroDB PRIDE Integrated access to proteome databases OGSA Distributed Query Processor OQL query OQL result OGSA-DAI Activity OGSA-DAI Activity OGSA-DAI Activity Automed DQP Wrapper Automed Wrappers User query Automed Query Processor Automed Repository Result All Hands Meetings, 2005

  15. Conclusions • Available e-science technologies provide rapid prototyping facilities for bioinformatics analyses • Combining such technologies is possible and opens up more possibilities • Taverna + DQP • Automed + DQP • Writing custom code is usually required • Processing service output to extract inputs for following services • Transforming results between data formats • Dealing with mismatches between identifiers • Developing a user-guided environment for the detection and resolution of mismatches • Development of Proteomics client applications (PepMapper, PepSeeker and PRIDE) All Hands Meetings, 2005

More Related