1 / 11

The Condor JobRouter

The Condor JobRouter. aka “schedd on the side”. Status. It’s in the current development series: Condor 7.1.0, unix (windows soonish) Used heavily by CMS physics experiment for simulation on Open Science Grid (millions of jobs routed). What is “job routing”?. original (vanilla) job.

Télécharger la présentation

The Condor JobRouter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Condor JobRouter

  2. aka “schedd on the side” Dan, Condor Week 2008

  3. Status It’s in the current development series: Condor 7.1.0, unix (windows soonish) Used heavily by CMS physics experiment for simulation on Open Science Grid (millions of jobs routed) Dan, Condor Week 2008

  4. What is “job routing”? original (vanilla) job routed (grid) job Universe = “vanilla” Executable = “sim” Arguments = “seed=345” Output = “stdout.345” Error = “stderr.345” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” Universe = “grid” GridType = “gt2” GridResource = \“cmsgrid01.hep.wisc.edu/jobmanager-condor” Executable = “sim” Arguments = “seed=345” Output = “stdout” Error = “stderr” ShouldTransferFiles = True WhenToTransferOutput = “ON_EXIT” JobRouter Routing Table: Site 1 … Site 2 … final status Dan, Condor Week 2008

  5. Routing is just site-level matchmaking • With feedback from job queue • number of jobs currently routed to site X • number of idle jobs routed to site X • rate of recent success/failure at site X • And with power to modify job ad • change attribute values (e.g. Universe) • insert new attributes (e.g. GridResource) • add a “portal” grid proxy if desired Dan, Condor Week 2008

  6. Configuring the Routing Table • JOB_ROUTER_ENTRIES • list site ClassAds in configuration file • JOB_ROUTER_ENTRIES_FILE • read site ClassAds periodically from a file • JOB_ROUTER_ENTRIES_CMD • read periodically from a script • example: query a collector such as Open Science Grid Resource Selection Service Dan, Condor Week 2008

  7. Syntax • Read the 7.1 manual. • It’s in the chapter on Grid Computing [ Name = “Grid Site 1”;GridResource = “gt2 gatekeeper…”;MaxIdleJobs = 10;FailureRateThreshold = 0.01; ] Dan, Condor Week 2008

  8. What Types of Input Jobs? • Vanilla Universe • Self Contained(everything needed is in file transfer list) • High Throughput(many more jobs than cpus) Dan, Condor Week 2008

  9. What Target Grid Types? • Globus, Condor-C work well • others untested, but should be fine • Why only target the grid universe? • no reason at all • 7.1.1 now allows any destination universe Dan, Condor Week 2008

  10. Grid Gotchas • Globus gt2 • no exit status from job (reported as 0) • must explicitly list desired output files Dan, Condor Week 2008

  11. JobRouter vs. Glidein • Glidein - Condor overlays the grid • job never waits in remote queue • job runs in its normal universe • private networks doable, but add to complexity • need something to submit glideins on demand • JobRouter • some jobs wait in remote queue (MaxIdleJobs) • job must be compatible with target grid semantics • simple to set up, fully automatic to run Dan, Condor Week 2008

More Related