200 likes | 240 Vues
Comparison of Scientific WfMS (Workflow Management Systems). B.Guillerminet IM Design Team, CEA IRFM 8 June 2011. Outline. Introduction Summary References Types of WfMS Models of Computations Business WfMS Scientific WfMS Comparison criteria
E N D
Comparison of Scientific WfMS(Workflow Management Systems) B.Guillerminet IM Design Team, CEA IRFM 8 June 2011
Outline • Introduction • Summary • References • Types of WfMS • Models of Computations • Business WfMS • Scientific WfMS • Comparison criteria • Patterns: Control, Data, Scientific workflow • Additionnal requirements • Introduction to Kepler • Introduction to Taverna • Introduction to Triana • Results • Workflow control patterns • Workflow data patterns • Scientific workflow patterns • Conclusions
Introduction • Summary • We report an evaluation (ref[1], Feb 2011) of three well-known open source WfMS (Kepler, Taverna and Triana) based on scientific workflow patterns. Experience and comments are also coming from ref 2-7. • References • “Pattern based evaluation of scientific workflow management systems”, Sara Migliorini, Mauro Gambini,Marcello La Rosa, Arthur H.M. ter Hofstede, Feb 2011 • “Workflow control-flow patterns: A Revised View”, Nick Russell, Arthur H.M. ter Hofstede, Wil M.P. van der Aalst, Nataliya Mulyar (http://www.workflowpatterns.com/evaluations/index.php) • “Scientific workflow system – can one size fit all?”, V Curcin, M. Ghanem, IEEE 2008, CIBEC’08 • “Scaling up workflow-based applications”, S Callaghan et al., Journal of Computer and System Sciences 76 (2010), 428-446 • “Scientific workflow design for mere mortals”, T McPhillips et al., Future Generation Computer Systems 25 (2009), 541-551 • “Scientific Workflows: Business as Usual? ”, B Ludascher et al., • “Heterogeneous Composition of Models of Computation”, A. Goderis et al., Nov 2007
Types of WfMS • WfMS: • WfMS are not yet standardized => research activity: business, scientific, control • Meaning of this simple workflow? • Different models of computation (MoC) : • Control flow: “Automated data processing” Use Case Usual “Business” WfMS, DAG, arrow = execution order • Data flow: “Plasma simulation” Scientific WfMS, loops, // execution, arrow = data • Time flow: “Equation solver” Control WfMS, differential equations, arrow = time • State flow: #phases (init, time step …), Fault recovery… Command & control, machine operation, arrow = event/transition
Types of WfMS • Business WfMS • Control flow and shared data, sequential execution, DAG MoC • Staffware, WebSphere MQ, COSA, iPlanet, SAP Workflow, FileNet, FLOWer, BPMN, UML 2.0 Activity Diagrams, EPCs, BPEL4WS, WebSphere BPEL, Oracle BPEL and XPDL • Scientific WfMS • Data flow, data copied, parallel computation, support for GRID/HPC • Major Open Source Scientific WfMS: Kepler, Taverna, Triana
Comparison criteria • Workflow Control patterns • Basic Control Flow patterns • Execution: sequential, // split, synchronization • Exclusive choice: “if … then … else …” • Simple merge • Advanced branching & synchronization patterns • Multi merge, Multi choice • Discriminators: Structured, Blocking, Cancelling • Partial Join: Structured, Blocking, Cancelling • Multiple instances: join, … • Use case: launching several different ways of solving the pb and using the fastest path • State-based patterns • Deferred choice (list) • Interleaved // routing: task executed once and no two tasks can be executed at the same time • Milestone: a task is enabled when the process is in a particular state • Use case: checkpoint • Critical section: only one critical process can be active at any given time // branches but using only one
Comparison criteria • Workflow Control patterns (cont’d) • Cancellation and Force Completion patterns • Withdrawn an activity • Iteration patterns • Arbitrary cycles: (while …) • Structured loop: (do/for 1..n) • Recursion • Termination patterns • Implicit or explicit termination • Trigger patterns • Transient or persistent trigger: external signal activates a task
Comparison criteria • Workflow Data Patterns • Data visibility patterns: Private, shared data? • Task data: only accessible by the task • Block data, scope data: accessible by several tasks • Multiple instance data: new data/ each execution instance • Case data, folder data, workflow data: shared data • Environment data: • Examples: database connector, file identifier • Data interaction patterns: • Task to task, task to sub-workflow • To/from Multiple instance task • Environment to task and task to environment • Data transfer patterns: • By value, by copying, by reference • Data transformation: apply a transformation to the data prior or after being passed to the process • Data-based routing patterns: • Task pre & post condition: execution if data are … • Event-based task trigger: able to trigger a task (external environment) • Data-based task trigger: able to trigger a task (within the workflow) • Data-based routing: associated to a split
Comparison criteria • Scientific Workflow Patterns • Dynamic input size: number of input tokens is specified at run time • Use case: the number of input data varies from one set to another • Dynamic Token Replication: number of output tokens is specified at run time • Dynamic Balancing of Input Tokens: • Use case: different input rates (example: temperature T (every hour) and pressure (every 2 hours) are acquired at different rate => task is executed with new value of T and old p) • Cartesian product of input tokens: build all the possible combination of inputs • Example: [1,2,3] & [9,8,7] => [(1,9),(1,8),(1,7),(2,9) …] • Use case: cracking your password • Not addressed criteria • Catalogue of components: • Mathematical, Visualization, Database … • GRID, HPC support … • Different MoC and mixing them
Introduction to Kepler • Summary of Kepler characteristics: • Developers NSF-funded Kepler/CORE • UC Davis, UC Santa Barbara, and UC San Diego. • Parent project Ptolemy II • Evaluated Release 1.0.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language MoML (XML-based) • License BSD License • Website http://kepler-project.org/ • Domain of application Physics, Ecosystems, Bioinformatics, Fusion (CPES, ITM) • Component = actor • Stateful = {init, (pre-fire,fire,post-fire), terminate} • I/O data = ports • Ontology: type checking at pre-init phase • External Models of Computation = Directors {DDF, PN, CT, FSM} • Mixing MoC but not every combinations are allowed
Introduction to Kepler • Example of Kepler workflow: • Adding T-uples
Introduction to Taverna • Summary of Taverna characteristics • Developers myGrid Team • University of Manchester, UK • Parent project myGrid • Evaluated Release 2.1 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language Scufl • License LGPL • Website http://www.taverna.org.uk • Application domains Biology, Bioinformatics, Chemioinformatics • Astronomy, Social Sciences and Music • Component = processor • I/O data = data link • Coordination link for “Control flow” link without data • Internal fault management = {nb of retries, time-out, alternative service} • One MoC: DAG
Introduction to Taverna • Example of Taverna workflow • Concatenate 3 strings • Using “coordination link” to force a sequential execution • Black arrow are data flow link
Introduction to Triana • Summary of Triana characteristics • Developers Cardiff University • Parent project: - • Evaluated Release 4.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • License Apache open source license version 2 • Website http://www.trianacode.org/ • Application domains Bioinformatics • Component = XML description (WSDL), Java code (local), Interface (remote) • One MoC = Data flow but • Trigger message for “Control flow” link without data
Introduction to Triana • Example of Triana workflow • Display the SQRT of a random number • Data flow
Workflow control patterns • Results • Basic Control Flow patterns • Ok for all • Advanced branching & synchronization patterns • Severe limitations due the absence of a mechanism for canceling running activities • State-based patterns: Kepler supports WCP 17 but … • Cancellation and Force Completion patterns: none • Iteration patterns • Triana and Kepler: ok but recursion • Not supported by Taverna • Termination patterns • Supported only by Kepler • Trigger patterns • None. Use case: external signal • Summary for control patterns • Kepler is the most powerful • Triana is close to Kepler • Several control patterns are missing in Taverna
Workflow data patterns • Results • Data visibility patterns: identical • Data interaction patterns: identical • Data transfer patterns: identical • Data-based routing patterns: • Taverna does not support this functionality due to the absence of “exclusive choice” (see WCP) • Summary for data patterns • Kepler & Triana are identical • Taverna is very close
Scientific Workflow patterns • Results • Dynamic input size: only Kepler • Dynamic Token Replication: only Kepler • Dynamic Balancing of Input Tokens: not supported by Kepler and partially by Triana and Taverna • Cartesian product of input tokens: only Taverna • Summary for Scientific workflow patterns • Triana is the less powerful • Kepler & Taverna have different specificities
Summary • “Kepler provides more functionalities than the other two systems” • “Taverna is compensated by the ease one can define a new processor” • “definition of a new component in Kepler requires a sophisticated programming skills (state + polymorphic behaviour to adapt to the chosen director)” • Real limitation of WfMS: // activities and waiting for only one completion