1 / 41

European School of Bioinformatics

European School of Bioinformatics. Data integration with EnCORE. Florian Reisinger florian@ebi.ac.uk. background problems working with web resources infrastructure existing EnCORE services communication model (sync / async) enXml + example data service combination (workflows)

garron
Télécharger la présentation

European School of Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. European School ofBioinformatics Data integration with EnCORE Florian Reisinger florian@ebi.ac.uk

  2. background problems working with web resources infrastructure existing EnCORE services communication model (sync / async) enXml + example data service combination (workflows) how to use the system user interfaces Overview

  3. Enfin

  4. Infrastructure Shallow integration easy addition of resources independent resources minimal centralisation easier to maintain very flexible Common Service Interface established standards well defined schema

  5. different ways to access data human interfaces: web page based forms, clients, queries programmatic interfaces: SOAP, REST, APIs, XML, text, … different programming languages Perl, Java, C#/C++,… different data models e.g. sequence as a FASTA or plain string various ways to model proteins, genes,… multiple identifier for one biological entity (UniProt, IPI, …) Common problems

  6. ? ? ? ? Diverse web service world database access analysis tools External service External service External service External service External service SOAP XML REST CSV plain text PERL API JAVA API • Multiple manual connections with possibly multiple technologies • Multiple result files which have to be combined manually • Difficult to keep audit trail • Much work to reproduce

  7. Enfin XML EnCORE External service External service heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML • Single entry point • One technology • No manual combination of results • Audit trial build in • Visualisation build in • Easy to reproduce EnVISION ! User interface & representation

  8. Web Service Definition Language Similar to a Java interface, it represents a contract between the service requestor and the service provider, but in contrast is designed to be language and platform independent. Mostly used for SOAP services Definitions: Interface information for all publicly available functions Data types for all requests and responses Binding information about the transport protocol to be used Address information for locating the specified service It contains all the information a client needs to use the service. Auto-generation of clients Web Service Interface: WSDL

  9. Service technologies Why using SOAP, XML and XML schema? • platform and language independent • well established technologies • already used in various standards • very good support in nearly all languages • well defined structure • can be validated • syntactically according to the schema • semantically using a validator tool

  10. AffyMetrix probe set ID to protein ID mapping ArrayExpress micro array data BioModels search for biological models CellMINT protein localization information g:GOSt protein grouping, functional profiling IntAct protein interactions KEGG pathway pathway search PICR Protein Identifier Cross Reference PRIDE protein identification Reactome pathway search UniProt protein information retrieval Utility generation of ENFIN XML from protein IDs Existing EnCore web services

  11. - doService ENFIN XML performs service with standard parameters - doServiceAdv performs service with custom parameters ENFIN XML - doServiceTest only echoes the input Synchronous communication service client call service

  12. protein domain prediction tool http://www.ibi.vu.nl/programs/domainationwww/ analysis tool, not only data retrieval service possible long run times sync communication inadequate initiator for async communication model Domaination

  13. client service ENFIN XML submit - doServiceAsync submits service with standard parameters & returns job ticket - getStatus loop reports the status of the job with specified ticket ENFIN XML if status OK retrieve - retrieveResult returns the result of job with specified ticket Asynchronous web services ticket number ticket number status ticket number

  14. FuncNet: protein function comparison http://www.funcnet.eu • Distributed protein function comparison pipeline • Given a set of proteins with some shared function... ... which of these other proteins also share that function? • Aggregation of pairwise functional similarity predictions between query and reference proteins • Example for test case: Predicting mitotic spindle proteins Many other uses, for example: Finding proteins related to LKB1 tumor suppressor

  15. CODA Protein lists IN Protein pairs OUT GECO Front-end service All communication uses SOAP. hiPPI etc... JACOP Use case FuncNet

  16. Enfin XML EnCORE NCBI External service g:GOSt External service PICR Reactome UniProt heterogeneous external world External service External service External service EnCORE service EnCORE service EnCORE service EnCORE service standardised EnCORE world Enfin XML Enfin XML Enfin XML Enfin XML EnVISION User interface & representation

  17. enXml – the EnCORE data exchange format XML defined by XML schema standard interface to services simple and easy to understand structure generic to allow various data types stores service results and keeps an audit trail minimal restrictions for data representation high degree of freedom modelling user data need for modelling guidelines to ensure service interoperability ENFIN XML

  18. <molecule id="ID1"> <names> <fullName>Breast cancer type 1 susceptibility protein</fullName> </names> <xrefs> <primaryRef refTypeAc="MI:0358" refType="primary-reference" id="P38398" dbAc="MI:0486" db="UniProt"/> </xrefs> <moleculeType termAc="MI:0326" term=“protein"/> <attribute name="UniProt keywords"> Zinc-finger;Zinc;Repeat;Polymorphism;Phosphorylation;Nuclear protein; Metal-binding;DNA-binding;DNA repair;DNA damage;Disease mutation; Cell cycle;Anti-oncogene;3D-structure </attribute> </molecule> Examples of data modelled in enXml

  19. <set id="ID12"> <participant moleculeRef="ID1"/> <participant moleculeRef="ID2"/> </set> <set id="ID33"> <names> <fullName>IntAct interaction</fullName> </names> <xrefs> <primaryRef id="EBI-1263051" db="intact“ dbAc=“MI:0469”/> </xrefs> <setType id=“SO:0001093" db=“SO" term="protein_protein_interaction"/> <participant moleculeRef="ID1"/> <participant moleculeRef="ID7"/> <attribute nameAc="MI:0001" name="interaction detection method">MI:0006</attribute> </set> Examples of data modelled in enXml

  20. <experiment id="ID57"> <names> <fullName>Enfin IntAct service: find interaction partners</fullName> <shortLabel>enfin-intact</shortLabel> </names> <input>ID2</input> <result>ID56</result> </experiment> <experiment id="ID15"> <names> <fullName>Enfin Reactome service: find pathways from protein list</fullName> <shortLabel>enfin-reactome</shortLabel> </names> <input>ID8</input> <result>ID13</result> <result>ID14</result> <parameter factor="3" term="enfin-reactome-max-pathways"/> <parameter factor="2" term="enfin-reactome-min-proteins-per-pathway"/> <attribute name="enfin-reactome-add-coverage">true</attribute> </experiment> Examples of data modelled in enXml

  21. Primarily designed as framework for bioinformaticians Write your own client to access one or multiple services (example clients available in different programming languages) Very flexible access, can be tailored to your specific needs Full control over the client and its functionality Create your own services to extend the functionality of EnCORE Additional “instant” usage and end user access Working example clients User interfaces EnVision / EnVision2, web interface to EnCORE services Taverna, workflow management tool How to use the system

  22. EnVISION simple one page web interface flexible mechanism of service connection possibility to specify options for service calls simple XSL transform of resulting Enfin XML EnVISION 2(prototype) web application for end user structured, user friendly representation of the data supports multiple datasets links to source databases for more detailed information Taverna (external project) powerful workflow design & management tool easy to use with EnCORE services User Interfaces

  23. EnVision

  24. EnVision

  25. EnVision

  26. EnVision 2 detail views for human readable presentation of service results simple start page implicit EnCORE web service calls supports multiple datasets

  27. EnVision 2

  28. EnVision 2

  29. EnVision 2

  30. EnVision 2

  31. EnVision 2

  32. EnVision 2 detail views for human readable presentation of service results modular for custom visualisation + simple start page implicit EnCORE web service calls + Externally run workflows (Taverna) supports multiple datasets

  33. Taverna

  34. EnCORE vision of the future

  35. Thank You!

More Related