1 / 19

Comparison of Scientific WfMS (Workflow Management Systems)

Comparison of Scientific WfMS (Workflow Management Systems). B.Guillerminet IM Design Team, CEA IRFM 8 June 2011. Outline. Introduction Summary References Types of WfMS Models of Computations Business WfMS Scientific WfMS Comparison criteria

dbrinkmann
Télécharger la présentation

Comparison of Scientific WfMS (Workflow Management Systems)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparison of Scientific WfMS(Workflow Management Systems) B.Guillerminet IM Design Team, CEA IRFM 8 June 2011

  2. Outline • Introduction • Summary • References • Types of WfMS • Models of Computations • Business WfMS • Scientific WfMS • Comparison criteria • Patterns: Control, Data, Scientific workflow • Additionnal requirements • Introduction to Kepler • Introduction to Taverna • Introduction to Triana • Results • Workflow control patterns • Workflow data patterns • Scientific workflow patterns • Conclusions

  3. Introduction • Summary • We report an evaluation (ref[1], Feb 2011) of three well-known open source WfMS (Kepler, Taverna and Triana) based on scientific workflow patterns. Experience and comments are also coming from ref 2-7. • References • “Pattern based evaluation of scientific workflow management systems”, Sara Migliorini, Mauro Gambini,Marcello La Rosa, Arthur H.M. ter Hofstede, Feb 2011 • “Workflow control-flow patterns: A Revised View”, Nick Russell, Arthur H.M. ter Hofstede, Wil M.P. van der Aalst, Nataliya Mulyar (http://www.workflowpatterns.com/evaluations/index.php) • “Scientific workflow system – can one size fit all?”, V Curcin, M. Ghanem, IEEE 2008, CIBEC’08 • “Scaling up workflow-based applications”, S Callaghan et al., Journal of Computer and System Sciences 76 (2010), 428-446 • “Scientific workflow design for mere mortals”, T McPhillips et al., Future Generation Computer Systems 25 (2009), 541-551 • “Scientific Workflows: Business as Usual? ”, B Ludascher et al., • “Heterogeneous Composition of Models of Computation”, A. Goderis et al., Nov 2007

  4. Types of WfMS • WfMS: • WfMS are not yet standardized => research activity: business, scientific, control • Meaning of this simple workflow? • Different models of computation (MoC) : • Control flow: “Automated data processing” Use Case Usual “Business” WfMS, DAG, arrow = execution order • Data flow: “Plasma simulation” Scientific WfMS, loops, // execution, arrow = data • Time flow: “Equation solver” Control WfMS, differential equations, arrow = time • State flow: #phases (init, time step …), Fault recovery… Command & control, machine operation, arrow = event/transition

  5. Types of WfMS • Business WfMS • Control flow and shared data, sequential execution, DAG MoC • Staffware, WebSphere MQ, COSA, iPlanet, SAP Workflow, FileNet, FLOWer, BPMN, UML 2.0 Activity Diagrams, EPCs, BPEL4WS, WebSphere BPEL, Oracle BPEL and XPDL • Scientific WfMS • Data flow, data copied, parallel computation, support for GRID/HPC • Major Open Source Scientific WfMS: Kepler, Taverna, Triana

  6. Comparison criteria • Workflow Control patterns • Basic Control Flow patterns • Execution: sequential, // split, synchronization • Exclusive choice: “if … then … else …” • Simple merge • Advanced branching & synchronization patterns • Multi merge, Multi choice • Discriminators: Structured, Blocking, Cancelling • Partial Join: Structured, Blocking, Cancelling • Multiple instances: join, … • Use case: launching several different ways of solving the pb and using the fastest path • State-based patterns • Deferred choice (list) • Interleaved // routing: task executed once and no two tasks can be executed at the same time • Milestone: a task is enabled when the process is in a particular state • Use case: checkpoint • Critical section: only one critical process can be active at any given time // branches but using only one

  7. Comparison criteria • Workflow Control patterns (cont’d) • Cancellation and Force Completion patterns • Withdrawn an activity • Iteration patterns • Arbitrary cycles: (while …) • Structured loop: (do/for 1..n) • Recursion • Termination patterns • Implicit or explicit termination • Trigger patterns • Transient or persistent trigger: external signal activates a task

  8. Comparison criteria • Workflow Data Patterns • Data visibility patterns: Private, shared data? • Task data: only accessible by the task • Block data, scope data: accessible by several tasks • Multiple instance data: new data/ each execution instance • Case data, folder data, workflow data: shared data • Environment data: • Examples: database connector, file identifier • Data interaction patterns: • Task to task, task to sub-workflow • To/from Multiple instance task • Environment to task and task to environment • Data transfer patterns: • By value, by copying, by reference • Data transformation: apply a transformation to the data prior or after being passed to the process • Data-based routing patterns: • Task pre & post condition: execution if data are … • Event-based task trigger: able to trigger a task (external environment) • Data-based task trigger: able to trigger a task (within the workflow) • Data-based routing: associated to a split

  9. Comparison criteria • Scientific Workflow Patterns • Dynamic input size: number of input tokens is specified at run time • Use case: the number of input data varies from one set to another • Dynamic Token Replication: number of output tokens is specified at run time • Dynamic Balancing of Input Tokens: • Use case: different input rates (example: temperature T (every hour) and pressure (every 2 hours) are acquired at different rate => task is executed with new value of T and old p) • Cartesian product of input tokens: build all the possible combination of inputs • Example: [1,2,3] & [9,8,7] => [(1,9),(1,8),(1,7),(2,9) …] • Use case: cracking your password • Not addressed criteria • Catalogue of components: • Mathematical, Visualization, Database … • GRID, HPC support … • Different MoC and mixing them

  10. Introduction to Kepler • Summary of Kepler characteristics: • Developers NSF-funded Kepler/CORE • UC Davis, UC Santa Barbara, and UC San Diego. • Parent project Ptolemy II • Evaluated Release 1.0.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language MoML (XML-based) • License BSD License • Website http://kepler-project.org/ • Domain of application Physics, Ecosystems, Bioinformatics, Fusion (CPES, ITM) • Component = actor • Stateful = {init, (pre-fire,fire,post-fire), terminate} • I/O data = ports • Ontology: type checking at pre-init phase • External Models of Computation = Directors {DDF, PN, CT, FSM} • Mixing MoC but not every combinations are allowed

  11. Introduction to Kepler • Example of Kepler workflow: • Adding T-uples

  12. Introduction to Taverna • Summary of Taverna characteristics • Developers myGrid Team • University of Manchester, UK • Parent project myGrid • Evaluated Release 2.1 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language Scufl • License LGPL • Website http://www.taverna.org.uk • Application domains Biology, Bioinformatics, Chemioinformatics • Astronomy, Social Sciences and Music • Component = processor • I/O data = data link • Coordination link for “Control flow” link without data • Internal fault management = {nb of retries, time-out, alternative service} • One MoC: DAG

  13. Introduction to Taverna • Example of Taverna workflow • Concatenate 3 strings • Using “coordination link” to force a sequential execution • Black arrow are data flow link

  14. Introduction to Triana • Summary of Triana characteristics • Developers Cardiff University • Parent project: - • Evaluated Release 4.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • License Apache open source license version 2 • Website http://www.trianacode.org/ • Application domains Bioinformatics • Component = XML description (WSDL), Java code (local), Interface (remote) • One MoC = Data flow but • Trigger message for “Control flow” link without data

  15. Introduction to Triana • Example of Triana workflow • Display the SQRT of a random number • Data flow

  16. Workflow control patterns • Results • Basic Control Flow patterns • Ok for all • Advanced branching & synchronization patterns • Severe limitations due the absence of a mechanism for canceling running activities • State-based patterns: Kepler supports WCP 17 but … • Cancellation and Force Completion patterns: none • Iteration patterns • Triana and Kepler: ok but recursion • Not supported by Taverna • Termination patterns • Supported only by Kepler • Trigger patterns • None. Use case: external signal • Summary for control patterns • Kepler is the most powerful • Triana is close to Kepler • Several control patterns are missing in Taverna

  17. Workflow data patterns • Results • Data visibility patterns: identical • Data interaction patterns: identical • Data transfer patterns: identical • Data-based routing patterns: • Taverna does not support this functionality due to the absence of “exclusive choice” (see WCP) • Summary for data patterns • Kepler & Triana are identical • Taverna is very close

  18. Scientific Workflow patterns • Results • Dynamic input size: only Kepler • Dynamic Token Replication: only Kepler • Dynamic Balancing of Input Tokens: not supported by Kepler and partially by Triana and Taverna • Cartesian product of input tokens: only Taverna • Summary for Scientific workflow patterns • Triana is the less powerful • Kepler & Taverna have different specificities

  19. Summary • “Kepler provides more functionalities than the other two systems” • “Taverna is compensated by the ease one can define a new processor” • “definition of a new component in Kepler requires a sophisticated programming skills (state + polymorphic behaviour to adapt to the chosen director)” • Real limitation of WfMS: // activities and waiting for only one completion

More Related