1 / 13

REI Recipe Execution Infrastructure

REI Recipe Execution Infrastructure. Purpose of REI. Main Objectives of REI Provide the services of a parallel Batch Queue System. Make it easy to control and monitor complicated batches with job synchronization.

vidor
Télécharger la présentation

REI Recipe Execution Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REIRecipe Execution Infrastructure

  2. Purpose of REI • Main Objectives of REI • Provide the services of a parallel Batch Queue System. • Make it easy to control and monitor complicated batches with job synchronization. • Make it possible to distribute tasks (processing load) over a cluster of CPUs/nodes. • Not Provided in the Present Implementation • Services for distributing data within the cluster to the nodes doing the processing (data sharing/distribution done via a common storage area/file server). • Services provided for resource management and advertising. • Services provided for explicit load balancing (optimized job distribution). • Special features for GRID appliance provided.

  3. Main Features • Main Features of REI • Implemented in C++ (in house implementation from scratch). • Uses RDBMS for information sharing and task synchronization. • Execution of shell commands or native execution of CPL Recipes (no generic interfacing to shared object files). • Pworker task execution daemon provided – can take three roles: • Process Master Commands – Master Pworker. • Process Standard Commands – Standard Pworker. • Process Master and Standard Comands. • Command line utilities provided to add/remove/monitor commands and to control Pworkers. • API provided for implementing Master Command Libraries (also referred to as Recipe Planners) and Standard Command Libraries.

  4. Command Line Interface • Interaction with REI • Command line interface provided: • addcmd: Add a Master Command in the Master Command Queue (handles ABs and SOFs, which are not part of core of REI). • cmdstat: Query the status of all commands or a specific command. ‘Tail’ feature provided. • rmcmd: Remove information for one command or all commands from the Command Queues (clean up). • pworker: The Pworker daemon. • stopworker: Stop one specific Pworker or all Pworkers running. • listworkers: List Pworkers running in the system. • rmworker: Remove a Pworker (make it exit) or all Pworkers. • The commands are not part of the core REI system, but should be seen as convenience features. They are based on the REI libraries. • Can add commands in the DB directly via the REI libraries, i.e., can control and monitor the operation of REI programmatically.

  5. Command Lifecycle • Command States • Each command submitted has 1 of 7 states indicating its current status:

  6. Command Transitions

  7. Interprocess Synchronization • Interprocess Synchronization/Information Sharing • Pworkers synchronize themselves via the DB. • DB also used for exchanging information between processes in the system: • Tables: • pworker_registry: Information about Pworkers in the system (ID, node, Master and/or Standard Commands, …). • pworker_master_command_queue: Contains information for the Master Commands waiting to be executed under execution and executed. • pworker_master_sequencer: Contains information about Master Commands being BLOCKED. • pworker_command_queue: Standard Commands waiting to be executed under execution and executed. • pworker_command_sequencer: Used to sequence Standard Commands. • pworker_log: Log messages from Pworker processes.

  8. OmegaCam Demo Science Reduction Cascade/1 • OmegaCam Science Demo Cascade – Example • Used adapted WFI frames (8 extensions). • Provided: • OCAM REI Recipe Planner Plug-In to schedule tasks for the recipes (general Recipe Planner for all Recipes made). • REI Standard Command Library Plug-Ins to do FITS file splitting and joining. • Cascade Scheduler Script to submit Master Commands and to create SOF’s needed. • 6 Recipes executed during the cascade (6 Master Commands issued to REI). • Total number of commands scheduled within REI for the cascade: ~100. • Total number of intermediate/temporary and final data products: ~200. • Number of SOF’s involved: 10.

  9. OmegaCam Demo Science Reduction Cascade/2 • Setting up Cascade – Example: $ addcmd -name ocam_reduce_sci_W_2005-02-08T16:29:05 -bg -waitforocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_sci /data/ocam/sof/ocam_reduce_sci_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_sci_W_2005-02-08T16:29:05 $ addcmd -name ocam_reduce_std_W_2005-02-08T16:29:05 -bg -waitforocam_mflat_W_2005-02-08T16:29:05-triggerocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_std /raid/data/ocam/sof/ocam_reduce_std_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_std_W_2005-02-08T16:29:05 $ addcmd -name ocam_mflat_W_2005-02-08T16:29:05 -bg -waitforocam_mtwilight_W_2005-02-08T16:29:05-triggerocam_mflat_W_2005-02-08T16:29:05 -recipe ocam_mflat /raid/data/ocam/sof/ocam_mflat_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_mflat_W_2005-02-08T16:29:05 …

  10. DOME BIAS BIAS DOME Split Split DOME BIAS Split Split Master Join Master BIAS DOME Join Split Split BIAS DOME Split Split BIAS DOME BIAS DOME BIAS DOME Task Synchronization Compl

  11. Command Scheduling Split Split Frame A Frame B Recipe Recipe Recipe Recipe Join Join

  12. DFO Cascading • Controlling REI – DFO Environment • Already used in operation by DFO (since a while). • DFO uses REI to control scheduling of a UNIX shell script, which itself controls the execution of the recipes (calling internally esorex). • DFO uses parallelism at frame level, no parallelism in connection with the processing of each frame. • REI used as a queue system, jobs are submitted and the scheduling and execution of the jobs carried out by REI. • Example addcmd in DFO environment: $ addcmd -name SINFO.2004-08-21T20:25:28.895_tpl.ab -bg -triggermflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T20:25:28.895_tpl.ab $ addcmd -name SINFO.2004-08-21T19:55:07.961_tpl.ab -bg -trigger mwave_SINFO.2004-08-21T19:55:07.961_tpl.ab -waitformflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T19:55:07.961_tpl.ab

  13. Using REI • How to Integrate a Pipeline in REI (Simplified …) • Decide how to execute the recipes: • Native way in the form of CPL Recipes. • Invoke the recipe library methods/functions from within Standard Commands. • Execute via jacket scripts/applications encapsulating recipe. • Define the necesary/desirable level of parallelism. • Define execution plans for the various cascades. • Implement Recipe Planner, if necessary, to do the internal coordination of the command scheduling (+ producing data for the Standard Commands). • Implement Standard Command Library with special commands, which should execute internally within the REI environment (if required). • Implement external control scripts to submit Master Commands, defining dependencies and providing data for the command execution if necessary. • Decide architecture of processing cluster (number of Master Pworkers, Pworkers, CPUs, nodes, amount of memory per CPU, …). • Start up Pworkers, defining their proper role + referring to the Command Plug-in Libraries provided (if any) and/or possible CPL Recipe Plug-in Libraries.

More Related