1 / 25

Process/data API

Process/data API. Process API - intro. The workflow engine runs applications Executable code in different languages API – methods Web services Applications require setup to run Where are they Where will they run (farm, local machine, specific machine Data IO Version etc.

Télécharger la présentation

Process/data API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process/data API

  2. Process API - intro • The workflow engine runs applications • Executable code in different languages • API – methods • Web services • Applications require setup to run • Where are they • Where will they run (farm, local machine, specific machine • Data IO • Version etc

  3. Process API - Intro • We do this 2 ways • As a single object process • We have defined a data object to hold things • We can use the same idea for the processAPI • Set up the object and “doIt” • As setup calls and application call • Define setups for a process • Use a single call to run the process

  4. ProcessAPI • The following are the fields within the WFE process object. (ignoring WFE specific) • Name & Human-readable name : not impt. • type • File : Where, could be URL • Data : see later • Runtime/fail time : does the API monitor these • parameters

  5. Process Object fields • Type • Ie is this an exec, URL, and so on • Process • The actual mapped process name. A Site specific mapping will define the actual meaning of the process name • Location : • Where is the application to run (client/server/farm), or other things like URL. • Is it useful to have this in the WFE - XML file – or as a separate process API XML setup. I would think the latter.

  6. Process-API • Data • The WFE data object defines input and output at run time – only mutability is class (static) • We have to pass data to a process, then it might be sensible to put the process object • See the data API definition for the object. • Some object containers are data in and some are data out – they need to have the same structure though.

  7. Process-API • Runtime and failtime • These are WFE exception manager properties • It might not be a good idea reproduce the exception outside the WFE as the WFE needs to handle any failure. Process failure must not be hidden from the WFE

  8. Process API • Parameters • Probably a python dictionary is best here. • Needs to be exposed to the WFE since different parts of the workflow may need different parameters (consider MAXIT)

  9. Process API • The problem I have is defining which data object is which. The data object needs a definition so the program knows what the data – see process API. • Using python class object These will of course be defined in the workflow engine variables. Note that adding of multiple data objects ProcOb = ApiProcess() ProcOb.set( ‘name ‘,‘myAlignProg’) ProcObset(‘parameters’], ‘-P 33 –x ddd’) ProcOb.set(‘type’,‘exec’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘input’, data.ob[‘D2’]) ProcOb.add(‘output1’,data.ob[‘D3’])

  10. Process API • Program Exec • Executable • Process : Use a mapped name for application – site specific • Location : local/server/farm – mapped names • How do we know which objects are which ? ProcOb = ApiProcess() ProcOb.set(‘type’,‘exec’) ProcOb.set(‘process’,‘maxit’) ProcOb.set(‘location’,’server’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘input’, data.ob[‘D2’]) ProcOb.add(‘output1’,data.ob[‘D3’]) processAPI.run (procOb)

  11. Process API • DataAPI copy • Copy data • Parameters = new version • Data objects – see later ProcOb = ApiProcess() ProcOb.set(‘name ‘, ‘copy’) ProcOb.set(‘parameters’, ‘newVersion’) ProcOb.set(‘process’,‘method’) ProcOb.set(‘location’,’dataAPI’) ProcOb.add(‘input’, data.ob[‘D1’]) ProcOb.add(‘output’,data.ob[‘D3’]) processAPI.run (procOb)

  12. Automated questions in XML •               <wf:task taskID="TD3" name="SequenceOK" nextTask="J1" breakpoint="false">                    <wf:description>Check whether the sequence align was OK</wf:description>                    <wf:decision type="AUTO">                        <wf:dataObjectsLocation>                            <wf:location dataID="D6" type="input"/>                        </wf:dataObjectsLocation>                        <wf:nextTasks>                            <wf:nextTask taskID="TW4">                                <wf:function dataID="D6" gte="20" less="200000000"/>                            </wf:nextTask>                            <wf:nextTask taskID="TM5">                                <wf:function dataID="D6" gte="2" less="20"/>                            </wf:nextTask>                            <wf:nextTask taskID="T9">                                <wf:function dataID="D6" gte="0" less="2"/>                            </wf:nextTask>                        </wf:nextTasks>                    </wf:decision>                </wf:task> Decision data object Decision option More complex functions will require python methods specific to the question

  13. Detail description to technology • A data object is pre-declared in the XML • Data place holder • Defines API object detail • A task object can reference data objects • As input, output or both • A process task : • API method • Exec program            <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false">                <wf:description>General object to copy</wf:description>                <wf:location namespace="__old_object" where="DM"/>            </wf:dataObject> • <wf:tasktaskID="T2" name="copyData" nextTask="T9" breakpoint="false">       <wf:description>Run API task to copy data object</wf:description>        <wf:processrunTime="00:00:04" failTime="00:00:10">          <wf:detail name="APIcopy" type="method" where="API"/>              <wf:dataObjectsLocation>                    <wf:locationdataID="D1" type="input"/>                    <wf:locationdataID="D2" type="output"/>               </wf:dataObjectsLocation>         </wf:process>   </wf:task>

  14. Creating data objects in WFE • # the data object ID'    self.object.set("deposition-dataset-ID",depID)    self.object.set("workflow-class-ID",classID)    self.object.set("workflow-instance-ID",instID)    self.type = data.getAttribute("type")    self.object.set("return-type",data.getAttribute("type"))    if (data.getAttribute("mutable")=="true"):      self.object.set("access",data.getAttribute("read-write"))    else:      self.object.set("access",data.getAttribute("read-only")) # internal workflow cross reference    self.name = data.getAttribute("dataID")    self.nameHumanReadable = data.getAttribute("name")    for detail in data.childNodes:      if (detail.nodeName == "wf:description"):        self.description = detail.firstChild.data      elif (detail.nodeName == "wf:location"):        self.nameSpace = detail.getAttribute("namespace")        self.object.set("data-object-name",detail.getAttribute("namespace"))        self.where = detail.getAttribute("where")        self.object.set("data-object-location",detail.getAttribute("where")) Each data XML statement is stored as a reference object This object is a place holder which can be passed to processes It contains information where to access data

  15. The engine data object • May be a real or virtual payload of data • Where, what and type • Payload is passed between tasks • The WF is a data processing pipeline • A real value can be examined to effect the WF • The path is dependent on data values (auto/manual decisions are based on these values) • The data version is WF instance data • Can be domain data (via dataAPI) • Can be WF data (via statusAPI) – scope defined by the object the data is stored in

  16. Engine process manager This is a thread – running inside exception manager • def run(self):    self.status = 1;    for key, value in self.inputObjects      istat = myApi.do(value) •    if self.task.uniqueType == "test": # test method - just counts for 5 seconds      for i = in (0,5):        time.sleep(1.0)    elif self.task.uniqueType == "method": # this is an API process      if self.task.uniqueWhere == "API": # this is an API method call         self.processAPI.runMethod(task.uniqueName)    elif self.task.uniqueType == "exec": # this is an exec program found "where"      self.processAPI.runExec(task.uniqueName, task.uniqueWhere) •    for key, value in self.outputObjects      istat = myApi.do(value)  self.statusAPI.setStatus(“finished”) Send the request data objects What sort of process is it ? Get the response data objects

  17. Workflow granularity • It does not really matter • A process can be as complex as you like • Depends on go-back granularity • Depends on “how much would loose if it crashed” • Data is the problem ! • The workflow is a flow of data – so hiding data from the engine will collapse a workflow to nothing. • The pathway choice is all about data – the less visible the data – the less choice in the workflow. • If a process decides what to do with data the consequence is : • Loose go-back ability • Loose track of the data and what is going on • Loose plug and play on the process. • Loose exception management.

  18. Engine design examples Interface task Process task Send data objects to interface Read XML – store objects and tasks Send data object requests Send actionable events Start/restart (maybe at go-back point) Wait for interface Run process Run tasks – follow path Get response data objects Get return action from interface Exit

  19. John’s requirements 1 • 1) Identify and copy and archive object • Object declaration           <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false">                <wf:description>General object to copy</wf:description>                <wf:location namespace="__old_object" where="DM"/>            </wf:dataObject>            <wf:dataObject dataID="D2" name="dataCopy" type="Object" dependence="D1" mutable="true">                <wf:description>General object - new copy of data</wf:description>                <wf:location namespace="__new_object" where="DM"/>            </wf:dataObject> • Task declaration • <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false">                    <wf:description>Run API task to copy data object</wf:description>                    <wf:process runTime="00:00:04" failTime="00:00:10">                        <wf:detail name="APIcopy" type="method" where="API"/>                        <wf:dataObjectsLocation>                            <wf:location dataID="D1" type="input"/>                            <wf:location dataID="D2" type="output"/>                        </wf:dataObjectsLocation>                    </wf:process>                </wf:task> The actual data The process – a method within the API Name reference

  20. John’s requirement 2make new data version • Declare data • Input D1 • Output D2 • Declare task • Method in API <wf:dataObjects> <wf:dataObject dataID="D1" name="dataToAddNewVersion" type="Object" mutable="true"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataNewVersion" type="Object" dependence="D1" mutable="true"> <wf:description>New version of data</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> </wf:dataObjects> <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task create a new version of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APInewVersion" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

  21. John’s requirement 3Get version list and show • Data – 3 objects • D1 – object target • D2 – Version list • D3 – Which one to use • Some tasks • Get list from API • Interface to chose (not shown) <wf:dataObject dataID="D1" name="dataObjectTarget" type="Object" mutable="false"> <wf:description>target object to query on</wf:description> <wf:location namespace="__object_name" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="VersionList" type="List" mutable="false"> <wf:description>Return version list</wf:description> <wf:location namespace="versionList" where="local"/> </wf:dataObject> <wf:dataObject dataID="D3" name="useVersion" type="Integer" mutable="true"> <wf:description>Version to use</wf:description> <wf:location namespace="version" where="WF"/> </wf:dataObject> <wf:task taskID="T2" name="requestVersionList" nextTask="T3" breakpoint="false"> <wf:description>Run API to get the version list of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIversionList" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

  22. John’s requirement 4/5data selector • A data object may need additional qualifiers to say what it is. • Selector value • “selection” • It is likely that the qualifier will : • need to be a WF class (static) variable • Need to be a WF inst (dynamic) variable.            <wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String"  mutable="true">                <wf:description>general object with qualifer</wf:description>                <wf:location namespace="__object" qualifier="_entity.id=1" where="DM"/>            </wf:dataObject> <wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String"  mutable="true">         <wf:description>general object with qualifer</wf:description>          <wf:location namespace="__object" qualifier="set_entity.type='protein' where entity.id=1" where="DM"/> </wf:dataObject>

  23. John’s requirement 6Length/size of object •            <wf:dataObject dataID="D1" name="dataTarget" type="Object" mutable="false">                <wf:description>General object to copy</wf:description>                <wf:location namespace="__object" where="DM"/>            </wf:dataObject>            <wf:dataObject dataID="D2" name="dataLength" type="integer" dependence="D1" mutable="true">                <wf:description>Length of data object</wf:description>                <wf:location namespace="dataLength" where="WF"/>            </wf:dataObject>                    <wf:process runTime="00:00:04" failTime="00:00:10">                        <wf:detail name="APIObjectSize" type="method" where="API"/>                        <wf:dataObjectsLocation>                            <wf:location dataID="D1" type="input"/>                            <wf:location dataID="D2" type="output"/>                        </wf:dataObjectsLocation>                    </wf:process> Define object and place holder for size value Run task to input data to function, and return length

  24. John’s requirement 7Format conversion •        <wf:dataObjects>            <wf:dataObject dataID="D1" name="dataObjectPDB" type="Object" mutable="false">                <wf:description>General object to convert format</wf:description>                <wf:location namespace="__object" where="DM"/>            </wf:dataObject>            <wf:dataObject dataID="D2" name="dataObjectMMCIF" type="Object" dependence="D1" mutable="true">                <wf:description>New data in different format</wf:description>                <wf:location namespace="__object" where="DF"/>            </wf:dataObject>            <wf:dataObject dataID="D3" name="status" type="string" dependence="D1" mutable="true">                <wf:description>A status code return</wf:description>                <wf:location namespace="__object" where="DF"/>            </wf:dataObject>        </wf:dataObjects> •                <wf:task taskID="T2" name="formatChange" nextTask="T9" breakpoint="false">                    <wf:description>Run API task to change the format of data</wf:description>                    <wf:process runTime="00:00:04" failTime="00:00:10">                        <wf:detail name="APIformatChangePDBtoPDBx" type="method" where="API"/>                        <wf:dataObjectsLocation>                            <wf:location dataID="D1" type="input"/>                            <wf:location dataID="D2" type="output"/>                            <wf:location dataID="D3" type="output"/>                        </wf:dataObjectsLocation>                    </wf:process>                </wf:task> Input and output formats Place holder for status – this might be so intrinsic to all tasks that it should probably be pre-declared and always present And the API function to do this

More Related