100 likes | 219 Vues
This document discusses the evolution of workflows, focusing on the integration of provenance with workflow management. Tommy Ellkvist and Juliana Freire present a framework utilizing OPM (Operational Provenance Model) to illustrate workflows as directed graphs. Nodes represent workflows while edges represent actions, depicting transformations executed by users. The example provided includes an OPM XML schema, showcasing artifacts and processes that record user interactions and data processing. This framework aids in understanding workflow histories and facilitates better data management and collaboration.
E N D
Workflow evolution provenance and OPM Tommy Ellkvist and Juliana Freire
Workflow Evolution Workflows Data Products Version Tree
Action based representation of workflows • Nodes represents workflows • Edges represents actions • Actions are transformations on workflows • Actions are performed by users 0 Add Module(0) 1 Add Module(1) 2 Add Connection(0,1) 3
OPM XML schema: Example of OPM (The OPM, 2007)
OPM XML schema: Translated OPM Example <OPMGraph ...> <Artifact> <ArtifactId>1</ArtifactId> <Account>G</Account> <Account>O</Account> </Artifact> <Artifact> <ArtifactId>1</ArtifactId> <Account>G</Account> <Account>O</Account> </Artifact> … <Process> <ProcessId>1</ProcessId> <Account>G</Account> </Process> <Process> <ProcessId>2</ProcessId> <Account>O</Account> </Process> <Process> <ProcessId>3</ProcessId> <Account>O</Account> </Process> … <Used ProcessId = "1" Role = "in" ArtifactId = "1"> <Account>G</Account> </Used> <Used ProcessId = "2" Role = "pair" ArtifactId = "1"> <Account>O</Account> </Used> <Used ProcessId = "3" Role = "in" ArtifactId = "3"> <Account>O</Account> </Used> <Used ProcessId = "4" Role = "in" ArtifactId = "4"> <Account>O</Account> </Used> <Used ProcessId = "5" Role = "left" ArtifactId = "5"> <Account>O</Account> </Used> <Used ProcessId = "5" Role = "right" ArtifactId = "6"> <Account>O</Account> </Used> <WasGeneratedBy ArtifactId = "2" Role = "out" ProcessId = "1"> <Account>G</Account> </WasGeneratedBy> … <Alternate Account1 = "O" Account2 = "G"/> </OPMGraph>
Vistrails XML Model <vistrail dbHost="" dbName="" dbPort="" id="" name="" version="0.9.0" xmlns:xsi="http://www.w3.org/..."> <action date="2008-05-27 17:35:39" id="1" prevId="0" prune="" session="" user="g-tomel"> <add id="0" objectId="0" parentObjId="" parentObjType="" what="module"> <module cache="1" id="0" name="String" package="edu.utah.sci.vistrails.basic" tag="" version="" /> </add> <add id="1" objectId="0" parentObjId="0" parentObjType="module" what="location"> <location id="0" x="-89.0" y="62.0" /> </add> </action> <action date="2008-05-27 17:35:43" id="2" prevId="1" prune="" session="" user="g-tomel"> <add id="2" objectId="1" parentObjId="" parentObjType="" what="module"> <module cache="1" id="1" name="ConcatenateString" package="edu.utah.sci.vistrails.basic" tag="" version="" /> </add> <add id="3" objectId="1" parentObjId="1" parentObjType="module" what="location"> <location id="1" x="-20.0" y="-67.0" /> </add> </action> <action date="2008-05-27 17:35:46" id="3" prevId="2" prune="" session="" user="g-tomel"> <add id="4" objectId="0" parentObjId="" parentObjType="" what="connection"> <connection id="0" /> </add> <add id="5" objectId="1" parentObjId="0" parentObjType="connection" what="port"> <port id="1" moduleId="1" moduleName="ConcatenateString" name="str1" spec="(edu.utah.sci.vistrails.basic:String)" type="destination" /> </add> <add id="6" objectId="0" parentObjId="0" parentObjType="connection" what="port"> <port id="0" moduleId="0" moduleName="String" name="value" spec="(edu.utah.sci.vistrails.basic:String)" type="source" /> </add> </action> </vistrail>
Vistrails XML Model: Translated to OPM <Agent> <AgentId>concat.xml</AgentId> <Agent>G</Agent> </Agent> <Artifact> <ArtifactId>0</ArtifactId> <Account>G</Account> </Artifact> <Artifact> <ArtifactId>1</ArtifactId> <Account>G</Account> </Artifact> <Artifact> <ArtifactId>2</ArtifactId> <Account>G</Account> </Artifact> <Artifact> <ArtifactId>3</ArtifactId> <Account>G</Account> </Artifact> <Process> <ProcessId>1</ProcessId> <Account>G</Account> </Process> <Process> <ProcessId>2</ProcessId> <Account>G</Account> </Process> <Process> <ProcessId>3</ProcessId> <Account>3</Account> </Process> <Used ProcessId = "1" Role = "in" ArtifactId = "0"stopTimeBegin = "2008-05-27 17:35:39" stopTimeEnd = "2008-05-27 17:35:39"> <Account>G</Account> </Used> <Used ProcessId = "2" Role = "in" ArtifactId = "1" stopTimeBegin = "2008-05-27 17:35:43" stopTimeEnd = "2008-05-27 17:35:43"> <Account>G</Account> </Used> <Used ProcessId = "3" Role = "in" ArtifactId = "2” stopTimeBegin = "2008-05-27 17:35:46" stopTimeEnd = "2008-05-27 17:35:46"> <Account>G</Account> </Used> <WasGeneratedBy ArtifactId = "1" Role = "out" ProcessId = "1” stopTimeBegin = "2008-05-27 17:35:39” stopTimeEnd = "2008-05-27 17:35:39"> <Account>G</Account> </WasGeneratedBy> <WasGeneratedBy ArtifactId = "2" Role = "out" ProcessId = "2” stopTimeBegin = "2008-05-27 17:35:43" stopTimeEnd = "2008-05-27 17:35:43"> <Account>G</Account> </WasGeneratedBy> <WasGeneratedBy ArtifactId = "3" Role = "out" ProcessId = "3” stopTimeBegin = "2008-05-27 17:35:46" stopTimeEnd = "2008-05-27 17:35:46"> <Account>G</Account> </WasGeneratedBy> <WasControlledBy ProcessId = "1" AgentId = "concat.xml" startTimeBegin = "2008-05-27 17:35:39” startTimeEnd = "2008-05-27 17:35:39” stopTimeBegin = "2008-05-27 17:35:39” stopTimeEnd = "2008-05-27 17:35:39"> <Account>G</Account> </WasControlledBy> <WasControlledBy ProcessId = "1" AgentId = "concat.xml" startTimeBegin = "2008-05-27 17:35:43" startTimeEnd = "2008-05-27 17:35:43" stopTimeBegin = "2008-05-27 17:35:43" stopTimeEnd = "2008-05-27 17:35:43"> <Account>G</Account> </WasControlledBy> <WasControlledBy ProcessId = "1" AgentId = "concat.xml" startTimeBegin = "2008-05-27 17:35:46" startTimeEnd = "2008-05-27 17:35:46" stopTimeBegin = "2008-05-27 17:35:46" stopTimeEnd = "2008-05-27 17:35:46"> <Account>G</Account> </WasControlledBy>
Observations • General model • Only contains enough information to traverse the provenance graph • No additional information stored • Different ways of representing workflow design provenance • Edges as actions • Edges as version differences
Observations • What is the time? • How to interpret a time T of a process? • Does interpretation affect querying • Semantics of intervals • Who is the Agent? • Users • Workflow system • The session • Workflow specification • ”OPM Level 2”? • Are ther workflow specifics we want to express