1 / 38

Workflow discovery in e-science

Workflow discovery in e-science. Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa. Agenda. Web services in science Workflow re-use Workflow discovery Is workflow discovery a new problem? How do people match up workflows?

vienna
Télécharger la présentation

Workflow discovery in e-science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa

  2. Agenda • Web services in science • Workflow re-use • Workflow discovery • Is workflow discovery a new problem? • How do people match up workflows? • Can we replicate the behaviour with tools? • Conclusions

  3. Workflows Web services

  4. Science is highly distributed and connected

  5. The Web has revolutionised science

  6. Web services about to do the same?

  7. Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures featuring lots of specialised data • Case study: bioinformatics • Understanding the DNA to behaviour link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflows • +/- 200 Taverna workflows shared at fffff

  8. Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures • Case study: bioinformatics • Understanding the DNA to life link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflow fragments • +/- 200 Taverna workflows shared at fffff

  9. Manchester, CS dept Manchester Biology dept Newcastle, CS dept

  10. Scientific workflows • e-science = supporting scientists to encode, enact, explain and share experimental procedures • Case study: bioinformatics • Understanding the DNA to life link • 3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna • Re-use and repurposing of workflow fragments • +/- 200 Taverna workflows shared at www.myExperiment.org

  11. One + Three questions • Can’t we just do it with ? • Keyword search doesn’t seem to cut it • Is workflow discovery a new problem? • How do people match up workflows? • Can we replicate the behaviour with tools?

  12. myExperiment.org my current workflow

  13. ? myExperiment.org my current workflow

  14. 1. Is workflow discovery a new problem? Source: survey of 21 myGrid/Taverna users

  15. 1. Is workflow discovery a new problem? Yes Workflow discovery subsumes service discovery

  16. ? 2. How do people match up workflows?

  17. ? 3. Can we replicate the behaviour with tools? 1 + 2 3 ... 1 2 3

  18. ? A user experiment with bioinformatics workflows +

  19. ? Workflow discovery task • Can I sensibly adapt an existing experimental procedure (workflow) with another one? • Extend Replace +

  20. Workflow corpus • 66 similar workflows for Graves’ disease done by single author • 1 + 5 workflows • Workflow diagram • No documentation • No annotation 1 + 5

  21. By the experts, for the experts • 9 bioinformaticians and 4 developers at a Taverna training day

  22. Matching strategies • Matching input workflow with 5 others 2 1 ? 3 4 5

  23. Human on-line matching strategies! • Traits • Scores of attraction • Yes or no

  24. Matching strategy: traits From an analysis of 30 000 profiles

  25. Matching strategy: scoring Score Percentile Confidencelevel www.AmIHotOrNot.com

  26. Matching strategy: yes or no

  27. Traits • Predicted trait

  28. Traits and score • Predicted trait • Score of similarity, usefulness and confidence E.g. [1 Identical – 9 Not similar]

  29. ? The gold standard • The collection of workflow similarity assessments • Predictive traits, possibly interacting 1 + 5 Traits/score

  30. 2. How do people match up workflows? • Difficulty of task • Biological relationship very difficult for 6 out of 9 • Shape similarity difficult for 4 out of 13 • Medium confidence • Consistency • Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test] • Predictive traits • No one trait dominant between and within participants [Levene homogeneity of variance test]

  31. Can we do better? • Simpler tasks and workflows • Taverna experienced users • Workflow documentation and annotation • Other factors in use, e.g. size difference • Fix allowed factors • Adopt black box approach: yes/no matching

  32. Automated discovery technique • Unattributed graph matcher implementation by Messmer and Bunke • Sub-isomorphism detection; exponential time complexity • DAGs and optimization for repository of graphs • Workflows parsed as graphs • Workflow input, workflow output andintermediate services as nodes • Data links as edges probeSetid databaseid AffyMapper_seq Blastx Results_Blastx

  33. Automated discovery technique • Ranking based on • shared nodes • difference in size between input graph and repository graphs

  34. Average similarity assessments across participants ? 3. Can we replicate the behaviour with tools? Kind of.. + 1 + 66 Traits/score

  35. ? OWL workflow ontology Current work Precision / recall Graph matching Text clustering 1 + 2 12 + 21 3 ... 1 2 3 Yes/no

  36. Take home • Scientists compose Web services for real – and share their results • Workflow discovery is a real problem, which subsumes service discovery • A range of matching strategies and techniques apply • Evaluation is a challenge - gold standards hard to build • Come and play at myExperiment.org • References at www.cs.man.ac.uk/~goderisa

More Related