180 likes | 284 Vues
Wrapping analytical services for caBIG. Taverna-caGrid technical review meeting. Stian Soiland-Reyes, myGrid University of Manchester, UK 2009-01-23. http://www.mygrid.org.uk/dev/wiki/display/caGrid. Agenda. Project overview Primary goals Service selection Services identified
E N D
Wrapping analytical services for caBIG • Taverna-caGrid technical review meeting Stian Soiland-Reyes, myGridUniversity of Manchester, UK 2009-01-23 http://www.mygrid.org.uk/dev/wiki/display/caGrid
Agenda • Project overview • Primary goals • Service selection • Services identified • Architecture • Service outputs • Service outputs • UML model • Template workflow • Work so far • Implementation plan
Project overview • Taverna caGrid cooperation • Taverna workbench enhancements for caGrid • Grid-enabling analytical services • caGrid security support for Taverna • This presentation deals with the analytical services
Primary goals • Identify two publicly available analytical web services currently accessible through Taverna • caGrid-enable the services; semantically described using caBIG’s infrastructure • Demonstrate building of workflows combining the new services with existing caBIG services
Service selection • Selected services in collaboration with the caGrid Workflow working group, lead by Juli • Winners: • NCBI Blast hosted by EBI • InterProScan hosted by EBI
Why these services? • Freely available • Highly reliable, hosted by EBI • Widely used by the scientific community • Can be combined with existing caBIG tools in biologically meaningful workflows • caBIO, GridPIR, etc.
Services identified • NCBI Blast • A popular similarity search tool using local sequence alignment • Supports sequences of proteins, DNA, RNA • Searches sequences in a whole range of databases • SWISSPROT, UNIPROT, NCBI, EMBL, etc. • SOAP web service hosted by EMBL-EBI
Services identified • InterProScan • Integrates various databases of protein domains and functional sites • Searches using protein signature recognition methods • SOAP web service hosted by EMBL-EBI
Architecture as pseudo code • class CaGridClient: • def main(): • endpointReference = wrappedService.invoke(inputs) • endpointReference.subscribe() • def resourcePropertyChanged(): • outputs = endpointReference.getResourceProperty() • print"Result", outputs • class WrappedService: • def invoke(inputs): • convertedInputs = dataConverter.convertFromCaGrid(inputs) • jobId = serviceInvoker.invoke(convertedInputs) • endpointReference = new EndpointReference(jobId) • return endpointReference • def outputReturned(jobId, outputs): • convertedOutputs = dataConverter.convertToCaGrid(outputs) • endpointReference.setResourceProperty(convertedOutputs) • class ServiceInvoker: • def invoke(convertedInputs): • jobId = originalService.invoke(convertedInputs) • return jobId
Output InterProScan (Untranslated) <EBIInterProScanResults xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/InterProScanResult.xsd"> <Header>..</Header> <interpro_matches> <protein id="uniprot|P01174|WAP_RAT" length="137" crc64="1C2E8ADA9FD97949" > <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> <child_list><rel_ref ipr_ref="IPR008198"/></child_list> <contains><rel_ref ipr_ref="IPR002098"/></contains> <classification id="GO:0030414" class_type="GO"> <category>Molecular Function</category> <description>protease inhibitor activity</description> </classification> <match id="G3DSA:4.10.75.10" name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score="9.899996308397199E-5" status="T" evidence="Gene3D" /> </match> <match id="PF00095" name="WAP" dbname="PFAM"> <location start="30" end="72" score="6.30000254573025E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score="1.59999889349247E-14" status="T" evidence="HMMPfam" /> </match> </interpro> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197"> ...</interpro> </protein> </interpro_matches> </EBIInterProScanResults>
Output InterProScan (Untranslated) <EBIInterProScanResults xmlns="http://www.ebi.ac.uk/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.ebi.ac.uk/schema/InterProScanResult.xsd"> <Header>..</Header> <interpro_matches> <protein id="uniprot|P01174|WAP_RAT" length="137" crc64="1C2E8ADA9FD97949" > <interpro id="IPR008197" name="Whey acidic protein, 4-disulphide core" type="Domain" parent_id="IPR015874"> <child_list><rel_ref ipr_ref="IPR008198"/></child_list> <contains><rel_ref ipr_ref="IPR002098"/></contains> <classification id="GO:0030414" class_type="GO"> <category>Molecular Function</category> <description>protease inhibitor activity</description> </classification> <match id="G3DSA:4.10.75.10" name="Whey_acidic_protein_4-diS_core" dbname="GENE3D"> <location start="77" end="128" score="9.899996308397199E-5" status="T" evidence="Gene3D" /> </match> <match id="PF00095" name="WAP" dbname="PFAM"> <location start="30" end="72" score="6.30000254573025E-5" status="T" evidence="HMMPfam" /> <location start="79" end="126" score="1.59999889349247E-14" status="T" evidence="HMMPfam" /> </match> </interpro> <interpro id="IPR008198" name="Proteinase inhibitor I17" type="Domain" parent_id="IPR008197"> ...</interpro> </protein> </interpro_matches> </EBIInterProScanResults>
Template workflow EBI_dbfetch_fetchBatch will be replaced with the caBIG service caBIO This workflow uses both NCBIBlast and InterproScan which will be replaced with the wrapped services http://www.myexperiment.org/workflows/230
Work so far • Identified services and example workflow • Described services (Deliverable 3.2) • Modelled service inputs and outputs in UML according to caGrid guidelines • Still a few tweaks needed for WS-Resource usage • Architecture and implementation plan for wrapping services (Deliverable 3.3) • JavaDoc needs updating for WS-Resource
Implementation plan • Generate Common Data Elements for inputs and outputs and verify Silver compatability • Generate semantically annotated XMIs • Submit Silver compatability review package • Implement and deploy wrapped services • Using Introduce and possibly gRavi • Implement, test , deploy • We’ll start with this before submitting CDEs • Build caGrid-based workflow using services