1 / 34

Provenance Challenges and Technologies for Grids

Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk. Provenance Challenges and Technologies for Grids. Contents. Provenance: problem definition Use cases of provenance in grids Architectural vision for provenance First experimentation, current work Research agenda

tyson
Télécharger la présentation

Provenance Challenges and Technologies for Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk Provenance Challenges and Technologies for Grids

  2. Contents • Provenance: problem definition • Use cases of provenance in grids • Architectural vision for provenance • First experimentation, current work • Research agenda • Provenance projects (EU, UK) • Conclusion

  3. Provenance: definition • Main Entry: prov·e·nancePronunciation: 'präv-n&n(t)s, 'prä-v&-"nän(t)sFunction: nounEtymology: French, from provenir to come forth, originate, from Latin provenire, from pro- forth + venire to come -- more at PRO-, COMEDate: 17851: ORIGIN, SOURCE2: the history of ownership of a valued object or work of art or literature (Merriam-Webster Online)

  4. The Grid and Virtual Organisations • The Grid problem is defined as coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations [FKT01]. • Effort is required to allow users to place their trust in the data produced by such virtual organisations

  5. Provenance and Virtual Organisations Given a set of services in an open grid environment that decide to form a virtual organisation with the aim to produce a given result; How can we determine the process that generated the result, especially after the virtual organisation has been disbanded?

  6. Provenance and Workflows • Workflow enactment has become popular in the Grid and Web Services communities • Workflow enactment can be seen as a scripted form of virtual organisation • The problem is similar: how can we determine the origin of enactment results?

  7. Use cases • Bioinformatics • Aerospace Engineering • Organ transplant management • Chemistry • Physics

  8. Provenance in Bioinformatics • Provenance in Drugs Discovery process Requirement on drug companies to keep a record of provenance of drug discovery as long as the drug is in use (up to 50 years sometimes). www.mygrid.org.uk

  9. Provenance in Aerospace Engineering Provenance requirement: to maintain a historical record of inputs/outputs from each sub-system involved in simulations. • Aircrafts’ provenance data need to be kept for up to 99 years when sold to some countries. • Currently, little direct support is available for this.

  10. Provenance in Organ Transplant Management • Decision support systems for organ and tissue transplant, rely on a wide range of data sources, patient data, and doctors’ and surgeons’ knowledge • Heavily regulated domain: European, national, regional and site specific rules govern how decisions are made. • Application of these rules must be ensured, be auditable and may change over time • Provenance allows tracking previous decisions: crucial to maximise the efficiency in matching and recovery rate of patients

  11. Provenance in Chemistry • PhD student’s supervisor may check that checking student’s experiment • Generatate automatically papers describing how experiment was carried out. • Intellectual property rights. www.combechem.org

  12. Physics CMS Atlas

  13. What is the problem? • Provenance recording should be part of the infrastructure, so that users can elect to enable it when they execute their complex tasks over the Grid or in Web Services environments. • Currently, the Web Services protocol stack and the Open Grid Services Architecture do not provide any support for recording provenance. • Methods are generally adhoc and do not interoperate.

  14. Architectural Vision Typical workflow enactment in service oriented architecture …

  15. Architectural Vision … with provenance support

  16. A First Prototype

  17. Sequence Diagram/Data Model • Must support recording of all information necessary to replay execution • Must support all complex forms of workflows (recursion, iterations, parallel execution).

  18. negotiate configuration invocation result invocation and result notify invocation and result notify PReP: Provenance Recording Protocol client service Provenance Service

  19. invocation invocation invocation client client service service Provenance Service result result invocation and result notify invocation and result notify invocation and result notify invocation and result notify invocation and result notify invocation and result notify client service Provenance Service Provenance Service result Provenance services may be shared or different Threesomes: a good idea on the Grid

  20. PReP Formalisation • Abstract machines • Properties • Termination • Liveness • Safety • Foundation for adding necessary cryptographic techniques

  21. Research Agenda (1) • In order for provenance data to be useful, we expect such a protocol to support some “classical” properties of distributed algorithms. • Using mutual authentication, an invoked service can ensure that it submits data to a specific provenance server, and vice-versa, a provenance server can ensure that it receives data from a given service. • With non-repudiation, we can retain evidence of the fact that a service has committed to executing a particular invocation and has produced a given result. • We anticipate that cryptographic techniques will be useful to ensure such properties

  22. Research Agenda (2) • Access control • Medical applications: organ transplant, IXI, e-Diamond • Scalability • DC2 10^7 files, CERN envision 10^12 files • From execution level provenance, how to infer domain level provenance.

  23. Research Agenda (3) Using provenance of data, trust metrics of the data can be derived from: • Trust the user places in invoked services • Trust the user places in the input data • Trust the user places in the enacted workflow • Trust the user places in the enactor • Trust the user places in the provenance service.

  24. The purpose of project PASOA to investigate provenance in Grid architectures • Funded by EPSRC under the “fundamental computer science for e-Science call” • In collaboration with Cardiff • www.pasoa.org

  25. EU Provenance STREP: Enabling and Supporting Provenance in Grids for Complex Problems • Partners • IBM United Kingdom Ltd • University of Southampton • German Aerospace Centre • University of Wales, Cardiff • Universitat Politecnica de Catalunya • MTA SZTAKI • To design, conceive and implement an industrial-strength open provenance architecture for Grid computing, and to deploy and evaluate it in complex grid applications (aerospace engineering and organ transplant management) • www.gridprovenance.org

  26. Functional Final Pre Prototype Prototype Prototype Architecture 1 Architecture 2 Standardisation (Interfaces) Proposal (Strawman) Scalability specification Requirements Security Specification Tools Domain Specific Specification 1 Application 1 Domain Specific Specification 2 Application 2 Provenance Workplan

  27. Key Deliverables

  28. Conclusion • Provenance is a rather unexplored domain • Strategic to bring trust in open environment • Necessity to design a secure, scalable and configurable architecture capable of supporting multiple requirements from very different application domains • Need to further investigate the algorithmic foundations of provenance, which will lead to scalable and secure industrial solutions • Deployment in real applications

  29. Acknowledgements • myGrid • Simon Miles, Juri Papay, Ananth Krishna, Michael Luck, David De Roure, Terry Payne, Mark Greenwood, Carole Goble, Martin Szomszor • Combechem • Gareth Hughes, Hugo Mills, monica schraeffel • PASOA • Omer Rana, Paul Groth, Simon Miles, Ben Caroll • EU-Provenance • Syd Chapman, John Ibbotson, Laszlo Varga, Steve Willmott, Ulises Cortes, Andreas Schreiber, Rolf Hempel

More Related