180 likes | 297 Vues
This presentation, delivered by Stian Soiland-Reyes at the University of Manchester on January 23, 2009, explores the integration of Taverna with caGrid for analytical services. Key project goals include identifying publicly available analytical web services and demonstrating their integration within workflows. The discussion covers service selection, architecture, and the specifics of using NCBI Blast and InterProScan services, both hosted by EBI, to create biologically meaningful workflows. Emphasis is placed on service reliability, usability, and the implications for the scientific community.
E N D
Wrapping analytical services for caBIG • Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGridUniversity of Manchester, UK 2009-01-23 http://www.mygrid.org.uk/dev/wiki/display/caGrid
Agenda • Project overview • Primary goals • Service selection • Services identified • Architecture • Service outputs • Service outputs • UML model • Template workflow • Work so far • Implementation plan
Project overview • Taverna caGrid cooperation • Taverna workbench enhancements for caGrid • Grid-enabling analytical services • caGrid security support for Taverna • This presentation deals with the analytical services
Primary goals • Identify two publicly available analytical web services currently accessible through Taverna • caGrid-enable the services; semantically described using caBIG’s infrastructure • Demonstrate building of workflows combining the new services with existing caBIG services
Service selection • Selected services in collaboration with the caGrid Workflow working group, lead by Juli • Winners: • NCBI Blast hosted by EBI • InterProScan hosted by EBI
Why these services? • Freely available • Highly reliable, hosted by EBI • Widely used by the scientific community • Can be combined with existing caBIG tools in biologically meaningful workflows • caBIO, GridPIR, etc.
Services identified • NCBI Blast • A popular similarity search tool using local sequence alignment • Supports sequences of proteins, DNA, RNA • Searches sequences in a whole range of databases • SWISSPROT, UNIPROT, NCBI, EMBL, etc. • SOAP web service hosted by EMBL-EBI
Services identified • InterProScan • Integrates various databases of protein domains and functional sites • Searches using protein signature recognition methods • SOAP web service hosted by EMBL-EBI
Architecture as pseudo code • class CaGridClient: • def main(): • endpointReference = wrappedService.invoke(inputs) • endpointReference.subscribe() • def resourcePropertyChanged(): • outputs = endpointReference.getResourceProperty() • print"Result", outputs • class WrappedService: • def invoke(inputs): • convertedInputs = dataConverter.convertFromCaGrid(inputs) • jobId = serviceInvoker.invoke(convertedInputs) • endpointReference = new EndpointReference(jobId) • return endpointReference • def outputReturned(jobId, outputs): • convertedOutputs = dataConverter.convertToCaGrid(outputs) • endpointReference.setResourceProperty(convertedOutputs) • class ServiceInvoker: • def invoke(convertedInputs): • jobId = originalService.invoke(convertedInputs) • return jobId
Output InterProScan (Untranslated) <EBIInterProScanResults xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/InterProScanResult.xsd"> <Header>..</Header> <interpro_matches> <protein id="uniprot|P01174|WAP_RAT" length="137" crc64="1C2E8ADA9FD97949" > <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> <child_list><rel_ref ipr_ref="IPR008198"/></child_list> <contains><rel_ref ipr_ref="IPR002098"/></contains> <classification id="GO:0030414" class_type="GO"> <category>Molecular Function</category> <description>protease inhibitor activity</description> </classification> <match id="G3DSA:4.10.75.10" name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score="9.899996308397199E-5" status="T" evidence="Gene3D" /> </match> <match id="PF00095" name="WAP" dbname="PFAM"> <location start="30" end="72" score="6.30000254573025E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score="1.59999889349247E-14" status="T" evidence="HMMPfam" /> </match> </interpro> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197"> ...</interpro> </protein> </interpro_matches> </EBIInterProScanResults>
Output InterProScan (Untranslated) <EBIInterProScanResults xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/InterProScanResult.xsd"> <Header>..</Header> <interpro_matches> <protein id="uniprot|P01174|WAP_RAT" length="137" crc64="1C2E8ADA9FD97949" > <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> <child_list><rel_ref ipr_ref="IPR008198"/></child_list> <contains><rel_ref ipr_ref="IPR002098"/></contains> <classification id="GO:0030414" class_type="GO"> <category>Molecular Function</category> <description>protease inhibitor activity</description> </classification> <match id="G3DSA:4.10.75.10" name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score="9.899996308397199E-5" status="T" evidence="Gene3D" /> </match> <match id="PF00095" name="WAP" dbname="PFAM"> <location start="30" end="72" score="6.30000254573025E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score="1.59999889349247E-14" status="T" evidence="HMMPfam" /> </match> </interpro> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197"> ...</interpro> </protein> </interpro_matches> </EBIInterProScanResults>
Template workflow EBI_dbfetch_fetchBatch will be replaced with the caBIG service caBIO This workflow uses both NCBIBlast and InterproScan which will be replaced with the wrapped services http://www.myexperiment.org/workflows/230
Work so far • Identified services and example workflow • Described services (Deliverable 3.2) • Modelled service inputs and outputs in UML according to caGrid guidelines • Still a few tweaks needed for WS-Resource usage • Architecture and implementation plan for wrapping services (Deliverable 3.3) • JavaDoc needs updating for WS-Resource
Implementation plan • Generate Common Data Elements for inputs and outputs and verify Silver compatability • Generate semantically annotated XMIs • Submit Silver compatability review package • Implement and deploy wrapped services • Using Introduce and possibly gRavi • Implement, test , deploy • We’ll start with this before submitting CDEs • Build caGrid-based workflow using services