1 / 28

Condor Overview

Condor Overview. Bill Hoagland. Condor. Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware under distributed ownership. Condor History. Developed by University of Wisconsin-Madison Computer Science Department

chanel
Télécharger la présentation

Condor Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor Overview Bill Hoagland

  2. Condor • Workload management system for compute-intensive jobs • Harnesses collection of dedicated or non-dedicated hardware under distributed ownership

  3. Condor History • Developed by University of Wisconsin-Madison Computer Science Department • First put into production use 15 years ago • Mature and stable

  4. Condor Availability • Freely available under a BSD style license • Not open source, code is not distributed publicly

  5. Supported Systems • Solaris 8, 9, & 10 (Sparc) • Red Hat & Fedora Core (x86) • MS Windows 2000, XP & 2003 Server (x86) • Mac OS 10.3 & 10.4 (PPC) • Other Unixes (SuSE, AIX, HPUX,Yellow Dog, Debian)

  6. Condor Design • Originally developed for “cycle stealing” from idle machines • Retains robustness to failures and changing availability from this legacy

  7. Condor Goal • “High throughput” vs “High performance” • High performance - fast machines (ie. Cray) • High throughput - many machines, fault tolerant infrastructure (ie. SETI@Home)

  8. Condor Components • Job queueing • Scheduling policy • Priority mechanism • Resource monitoring • Resource management

  9. Condor Highlights • Checkpointing • Checkpointing saves complete running process and I/O state to disk

  10. Checkpointing • Allows recovery from failures • Roll back to the last saved state • Allows process migration • Move saved state and restart

  11. Checkpointing continued • Can compress checkpoint images • Checkpoint mechanism can be used outside of Condor

  12. Checkpointing continued • Some limitations • Single process space • Single kernel thread • Cannot save state of file open for both read and write • Not supported on all platforms

  13. Checkpointing continued • Must have object files • Usually requires no changes • Relink code to include condor library layer, e.g. $ condor_compile gcc -o foo foo.c

  14. Condor Highlights • Remote system calls • Preserves user environment on remote machine • Users need not make files available or have access to remote machine

  15. Condor Highlights • Pools of Machines can be Hooked Together • Jobs submitted to one pool can migrate to a second • Subject to the policies of each pools owner

  16. Condor Highlights • Jobs can be Ordered • Jobs can be ordered because of dependencies easily • Dependencies are described in a directed acyclic graph

  17. Condor Highlights • Condor Enables Grid Computing • Condor has been designed with grid support hooks • Globus controlled resources

  18. Condor Highlights • Sensitive to the Desires of Machine Owners • Machine owners may set almost any usage policy

  19. Condor Highlights • Powerful priority policy mechanism • Requirements and preferences are associated with jobs and machines • A negotiation process matches job requirements then ranks on preferences

  20. Condor Security • Condors purpose is to allow users to run arbitrary code on large numbers of machines • Assumes users are trustworthy

  21. Condor Security continued • Cannot protect against users that can elevate their privileges • Does not run user jobs in sandboxes

  22. Condor Security continued • Can prevent unauthorized access to Condor • Optional authentication e.g. Kerberos, Grid Security Infrastructure (GSI), others

  23. Condor Security continued • Can ensure that user data has not been examined or tampered with • Optional encryption and integrity checking of all network traffic

  24. Condor Backfill • When machine completely idle… • Configure default job • Support for BOINC

  25. Condor Configuration • Controlled by hierarchical config files • Well commented • Human readable • In some cases, more clear than the manual

  26. Condor Adminstration • CondorView • Web based statistics • Machine and user data

  27. Condor Website • http://www.cs.wisc.edu/condor

More Related