1 / 10

Workflow Management in Condor

Workflow Management in Condor. Gökay Gökçay. DAGMan Meta-Scheduler . The Directed Acyclic Graph Manager ( DAGMan ) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results

lois
Télécharger la présentation

Workflow Management in Condor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow Management in Condor Gökay Gökçay

  2. DAGMan Meta-Scheduler • The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results • DAGManreads the Condor log file generated by each Condor job to find out which jobs are unsubmitted, submitted, or complete. • DAGManalso makes a guarantee that a DAG is recoverable, even if the machine running DAGMan goes down during execution.

  3. Dag File Example # Filename: diamond.dag Job A A.condor Job B B.condor Job C C.condor Job D D.condor PARENT A CHILD B C PARENT B C CHILD D

  4. Submitting the DAG to Condor • In order to guarantee recoverability, the DAGMan program itself is run as a Condor job. • “condor_submit_dagdiamond.dag” • This script will generate the diamond.dag.condor.subCondorCommandFile for the DAG, and submit it to Condor

  5. Essentials • Prepare Jobs Each CondorCommandFile can only submit one job. Multi-job clusters (multiple queue lines) are not supported. The log= for all CondorCommandFiles must point to the same Condor log file, otherwise, DAGMan will not see all the Condor log entries for every job in the DAG. • Write DAG File Write the DAG file, so that JOB entries refer to the CondorCommandFiles you wrote in the previous step. • Submit the DAG Finally, you submit the DAG written in the previous step using the condor_submit_dag script.

  6. Complications • Setup, Cleanup, or Interpretation of a Node (Scripts) (Ex: Decompression, Compression, Serialization etc.) • Throttling (Too many scripts) • Unreliable applications or subsystems

  7. Stork • Stork is an emerging Condor technology for managing data placement. • Stork provides a fault tolerant framework for scheduling data allocation and data transfer jobs. The architecture is modular and extensible, with support for many popular storage systems and data transfer protocols. • Modules: ftp , gsiftp (Grid FTP), http, nest (Condor Nest Network Storage), srb (SDSC storage resource broker), csrm (Castor Srm), srm(dCacheSRM), unitree(NCSA UniTree), diskrouter

  8. Condor submit file $ cat process.condor universe = vanilla executable = /bin/sort arguments = /tmp/stork/index.html /tmp/stork/classad-talk.ps output = /tmp/stork/process.results.out error = process.results.err log = process.results.log should_transfer_files= YES when_to_transfer_output= ON_EXIT notification = never queue

  9. Using Stork with Condor DAGMan $ cat transfer.stork [ dap_type= transfer; src_url= "file:/tmp/stork/process.results.out"; dest_url= "nest://turkey.cs.wisc.edu/1.dat"; alt_protocols= "gsiftp-nest" log = "transfer.log"; ] $ cat stork-condor.dag DATA INPUT1 alt_protocol.stork DATA INPUT2 transfer_ftp-file.stork JOB PROCESS process.condor DATA OUTPUT transfer.stork PARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD OUTPUT

  10. ThanksForListening Questions?

More Related