1 / 11

Provenance Challenge

Provenance Challenge. Simon Miles, Mike Wilde, Ian Foster and Luc Moreau. Provenance. In the study of fine art, provenance refers to the documented history of some art object. If the provenance of data produced by computer systems could be determined like it can for some works of art,

chaeli
Télécharger la présentation

Provenance Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau

  2. Provenance • In the study of fine art, provenance refers to the documented history of some art object. • If the provenance of data produced by computer systems could be determined like it can for some works of art, then users would be able to interpret and judge the quality of data better.

  3. The Provenance of the Challenge • Back in May: IPAW’06 (International Provenance and Annotation Workshop) • www.ipaw.info • Proceedings to appear in LNCS 4145

  4. Standardisation discussion at IPAW’06 • How can (workflow-based or other) systems inter-operate? • Individual systems may be able to track provenance of data • How can we that we track provenance of data across systems? • Would a standard be useful? • At the time, it was felt it was premature to standardise, we needed to understand systems’ capabilities

  5. The Challenge Aims • The provenance challenge aims to establish an understanding of the capabilities of available provenance-related systems • The representations that systems use to document details of processes that have occurred • The capabilities of each system in answering provenance-related queries • What each system considers to be within scope of the topic of provenance (regardless of whether the system can yet achieve all problems in that scope) twiki.ipaw.info

  6. The Challenge Process Each participant in the challenge will have their own page on this TWiki, following the ChallengeTemplate, where they can inform the rest of their efforts in meeting the challenge.. • Representations of the workflow in their system • Representations of provenance for the example workflow • Representations of the result of the core (and other) queries • Contributions to a matrix of queries vs systems, indicating for each that: (1) the query can be answered by the system, (2) the system cannot answer the query now but considers it relevant, (3) the query is not relevant to the project. Optionally, the participants may like to contribute the following. • Additional queries that illustrate the scope of their system • Extensions to the example workflow to best illustrate the unique aspects of their system • Any categorisation of queries that the project considers to have practical value twiki.ipaw.info

  7. twiki.ipaw.info

  8. The Queries • Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. • Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. • Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. • Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model (see model menu describing possible values of parameter "-m 12" of align_warp) that ran on a Monday. • Find all Atlas Graphic images outputted from workflows where at least one of the input Anatomy Headers had an entry global maximum=4095. The contents of a header file can be extracted as text using the scanheader AIR utility. • Find all output averaged images of softmean (average) procedures, where the warped images taken as input were align_warped using a twelfth order nonlinear 1365 parameter model, i.e. "where softmean was preceded in the workflow, directly or indirectly, by an align_warp procedure with argument -m 12." • A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant. • A user has annotated some anatomy images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago. • A user has annotated some atlas graphics with key-value pair where the key is studyModality. Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.

  9. 17 Participating Teams • REDUX, Database Research Group, MSR • MINDSWAP, Semantic Web Research Group, University of Maryland, College Park • Karma, Computer Science Department, Indiana University • CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech Republic • myGrid, University of Manchester • VisTrails, University of Utah • Gridprovenance, Cardiff University • ES3, University of California, Santa Barbara • UPenn, University of Pennsylvania, Database Group • RWS, UC Davis and SDSC, California • DAKS, Genome Center, UC Davis, California • PASS, Harvard • SDG, Pacific Northwest National Lab • NcsaD2k and NcsaCi, National Center for Supercomputing Applications • UChicago, University of Chicago Computation Institute • Southampton, University of Southampton, PASOA and Provenance projects • USC/ISI, University Of Southern California/Information Sciences Institute twiki.ipaw.info

  10. Schedule • Session 1: Wednesday 10.00-11.30 team presentations • Session 2: Wednesday 13.00-15.00 team presentations • Session 3: Wednesday 16.00-17.30 • Session 4: Thursday 9.30-11.00 analysing commonalities and differences • Session 5: Thursday 11.30-13.00 what next? sessions 3-5 are open, contribute ideas on twiki http://twiki.ipaw.info/bin/view/Challenge/WorkshopAgenda

  11. 10.00-10.10: Introduction • 10.10-10.20: PNL • 10.20-10.30: UPenn, University of Pennsylvania, Database Group • 10.30-10.40: UChicago • 10.40-10.50: myGrid, University of Manchester • 10.50-11.00: Kepler (SDSC) • 11.00-11.10: Kepler (UCDavis) • 11.10-11.20: VisTrails, University of Utah • 13.00-13.10: REDUX, Database Research Group, MSR • 13.10-13.20: CESNET, GRID research group, CESNET z.s.p.o. Prague, Czech Republic • 13.20-13.30: Karma, Computer Science Department, Indiana University • 13.30-13.40: MINDSWAP, Semantic Web Research Group, University of Maryland, College Park • 13.40-13.50: PASS, Harvard slides • 13.50-14.00: Southampton, PASOA/EU Provenance • 14.00-14.10: Gridprovenance, Cardiff University • 14.10-14.20: ISI • 14.20-14.30: NCSA • 14.30-14.40: ES3, University of California, Santa Barbara

More Related