Enhancing Taverna Workflows with caGrid Integration
130 likes | 255 Vues
This presentation discusses the integration of Taverna workflows with caGrid services, outlining the benefits of using a data-oriented approach for scientific research. It provides an overview of Taverna as a workflow management system, exemplified with a caGrid workflow to predict lymphoma types utilizing gene-expression patterns. The session highlights the caGrid plugin for Taverna, current developments, and future directions, including tighter integration with caDSR and enhancing support for complex XML types.
Enhancing Taverna Workflows with caGrid Integration
E N D
Presentation Transcript
Taverna workflowsin caGrid caGrid Architecture Face-to-face meeting Stian Soiland-Reyes & Aleksandra Nenadic, myGridUniversity of Manchester, UK Boston, 2009-05-11 http://www.mygrid.org.uk/dev/wiki/display/caGrid
Agenda • What is a Taverna Workflow? • Abstract caGrid workflow example • Actual Taverna workflow • caGrid plugin for Taverna • Current work • Where do we go next?
What is a Taverna workflow? • Set of services(web services, RESTful, local scripts, other workflows, etc) • Set of data links between services - “put output X from service A as input Y to service B” • If needed: List handling, control links • This can be called a data-oriented workflows (dataflow) • Say where you want the data to flow instead of what you want to do • Compare with more procedural workflow languages like BPEL • Beneficial way of thinking for much data-driven scientific research
Abstract caGrid workflow • Query the CPAS data service to find protein sequence • Use (parts of) result to query GridPIR and caBIO data services for matching sequences
Actual Taverna workflow • Looks very similar to abstract workflow • Introduces shim services to build and parse data elements Blue: Constant CQL query Purple: Build/parse complex type for web service input/output Orange: Local scripts to parse the description string and build CQL queries Green: caGrid WSDL services http://www.myexperiment.org/workflows/752
caGrid plugin for Taverna (1) • Listing all services: • Discover/browse services registered in the caGrid Index Service • Easy to install into Taverna:
caGrid plugin for Taverna (2) • …or by semantic search:
Current work by myGrid & caGrid • Develop Taverna support for GAARDS-secured caGrid services • Wrap existing 3rd party services (that are used by existing Taverna users) for caGrid and annotate them to match Silver-level compatibility guidelines • Taverna workflow as a caGrid service • Service discovery improvements • Documentation, building example workflows
Real example: Lymphoma type prediction • Scientific value • Using gene-expression patterns associated with DLBCL and FL to predict the lymphoma type of an unknown sample. • Using SVM (Support Vector Machine) to classify data, and predicting the tumor types of unknown examples. • Main steps • Query training data from experiments stored in caArray • Preprocess (normalize) the microarray data. • Add training and testing data into SVM service to get classification results *Fig. from MA Shipp. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.Nature medicine, 2002
Lymphoma type prediction workflow Wei Tanhttp://www.myexperiment.org/workflows/746 Query • Preprocess • Classify & predict
Lymphoma type prediction results The (few) classification errors are highlighted Acknowledgements: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT), Wei Tan
Where do we go next? • Just some ideas.. • Tighter integration with caDSR • Partial rerun of workflows • Improve Taverna’s support for complex XML types • Workflow sharing • Workflows in caGrid portal • Guided workflow building using caGrid metadata • Easily build CQL queries from Taverna • Google Summer of Code 2009