1 / 12

Part 7: CondorG

Part 7: CondorG. Part 7: CondorG. A: Condor-G B: Laboratory: CondorG. A: Condor-G. Condor-G. A client-side job management system for the grid General-purpose Can manage large numbers of jobs Handles many failures gracefully. Condor-G. Condor-G can manage a large number of jobs

Télécharger la présentation

Part 7: CondorG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 7:CondorG

  2. Part 7: CondorG • A: Condor-G • B: Laboratory: CondorG

  3. A: Condor-G

  4. Condor-G • A client-side job management system for the grid • General-purpose • Can manage large numbers of jobs • Handles many failures gracefully

  5. Condor-G • Condor-G can manage a large number of jobs • You specify the jobs in a file and submit them to Condor, which runs them all and keeps you notified on their progress • Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc. • Condor-G can handle inter-job dependencies (DAGMan) • You can set job priorities

  6. Condor-G • Condor-G handles many failures gracefully • Condor-G does whatever it takes to run your jobs, even if… • The gatekeeper is temporarily unavailable • The job manager crashes • The network goes down • Your machine crashes

  7. Condor-G Fault-Tolerance:Lost Contact with Remote Jobmanager Can we contact gatekeeper? Yes - jobmanager crashed No – retry until we can talk to gatekeeper again… Can we reconnect to jobmanager? No – machine crashed or job completed Yes – network was down Restart jobmanager Has job completed? No – is job still running? Yes – update queue

  8. Credential Management Pull refreshed credentials from MyProxy? Push refreshed credentials to remote systems Job Scheduling Use Matchmaking to select resources for jobs WS-GRAM Support for GT4 GlideIn Allows late binding of resources and job checkpoint/migration Other Condor-G Features

  9. Lab 7: CondorG

  10. Lab 7: CondorG • In this lab, you’ll: • Configure and start Condor • Display Condor information • Submit • Single job, multiple job, multiple job with separate directories • Diagnose and release a held job • Shut Condor down

  11. Credits • Portions of this presentation were adapted from the following sources: • Jaime Frey, UW-Madison

More Related