1 / 20

Sun Grid Engine

Sun Grid Engine. Grids. Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access point; kind of like plugging into an electrical grid • Cluster grids: resources in one room • Campus grids: multiple clusters on one campus

michon
Télécharger la présentation

Sun Grid Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sun Grid Engine

  2. Grids • Grids are collections of resources made available to customers. • Compute grids make cycles available to customers from an access point; kind of like plugging into an electrical grid • • Cluster grids: resources in one room • • Campus grids: multiple clusters on one campus • • Global grids: Cross administrative domains

  3. Grids • Potentially (ideally?) you could completely outsource your HPC needs by buying time on a commercial grid. Running a big data center is tricky and takes expensive people. If you are, say, a small computer animation group working on an animated short it might not make sense to set up a data center for six months of work • OTOH, if you’re Pixar or Lucas this is a core competency

  4. Sun Grid Engine • SGE is a piece of software that matches jobs to compute resources • BTW, SGE runs on OS X. This would be another fine project for someone to investigate

  5. SGE • As we’ve seen, Sun Grid Engine can accept a batch job and give it to a compute node. • SGE (base level) is open source; see http://gridengine.sunsource.net/ • There are some other issues: • • Multiple queues • • Giving jobs only to nodes with the necessary resources • • Queue manipulation

  6. SGE • Users submit jobs; they’re kept by SGE in a holding area until resources become available, then sent to an execution device. The results are reported back. • Types of hosts: master, execution, administration, and submit • Master runs the master daemon and scheduling daemon • Execution hosts are where jobs are run, admin hosts can manipulate the queues • There are a lot of knobs to twiddle on SGE

  7. SGE • Imagine a bank that has five customers walk in. Four just want to deposit a check, and the fifth wants to set up a home loan. • If the home loan guy happens to be first, and there is only one queue, the four with short transactions wait for a long time. • What’s more, the home loan guy must have manager approval at some point in the process • So: set up two queues, one for long transactions, one an express lane. The home loan queue specifies that the manager must be available. • This reduces the median time spent in queue for the short transaction customers, and reduces the variance of the waiting time

  8. SGE Queues • There may be more than one queue; jobs are associated with queues • qconf -sql Shows the list of defined queues • Why multiple queues? Some types of jobs may be very long or require specific resources, so users may submit jobs to queues optimized for those types of jobs Execution Host Q1 SGE Master SGE Scheduler Execution Host Q2 Execution Host

  9. Scheduler • The scheduler (which assigns jobs to execute hosts) looks at several factors: • • Load parameters, how busy the execute hosts are by some measure • • Consumable resources, memory, disk space, licenses, etc. SGE keeps track of these and dispatches a job only if resources are available • • Attributes, such as 64-bit, G5, etc. These aren’t necessarily consumed, but may simply be a state • The scheduler may look at all these factors before assigning a job from the holding pool to an execution host

  10. Consumable Resources • There are some finite resources in the cluster: CPU time, disk space, licenses, bandwidth • Available capacity for these is defined by the administrator; the scheduler examines available consumables when deciding what to run

  11. Requestable Attributes • On job submission you can request attributes or characteristics: at least X amount of memory, a license for software package Y, a 64 bit host, etc. • In a production environment licenses can be a big deal. Circuit design software may cost thousands per node, so not every node on the cluster may have a license. • The attributes can be related to the hosts or the queues • Attributes that are “requestable” can be mentioned in the qsub command, so jobs may require that attribute to run

  12. SGE • You don’t need to submit a job to a specific queue; instead you can simply ask for certain resources, and SGE will pick a queue based on the requirement profile

  13. Environment Variables • When a job runs on a host some environment variables are set: • ARC • SGE_ROOT • SGE_STDOUT_PATH • HOME

  14. Dependencies • Suppose you divide up a task into several subtasks. This can require sequencing--some subtasks may need to be finished before other subtasks can run. You can specify a list of jobs that must finish before this job runs

  15. Listing Attributes • qconf -scl lists “complexes” of attributes. Typically this includes a complex for the queues, and one for the hosts • qconf -sc host|queue Lists attributes for a complex #name shortcut type value relop requestable consumable default #-------------------------------------------------------------------------------------- arch a STRING none == YES NO none num_proc p INT 1 == YES NO 0 load_avg la DOUBLE 99.99 >= NO NO 0

  16. Modifying Attributes • Qconf -mc [complex name] opens up an editor that allows you to modify the complex settings

  17. Attributes • Note that some attributes are “requestable”. This means that you can specify that your job requires that attribute from the qsub command line. • Qsub -l arch=“glinux” says the job requires a “glinux” host to run • Qconf -se compute-0-0 shows resources for a host

  18. Priorities • By default jobs are handled in a FIFO manner. As they come in they are assigned to a compatible queue for processing by the scheduler. • Qsub -p can provide a priority to the job that can override FIFO behavior. • Qdel and qstat to find and delete jobs from the holding area

  19. Checkpointing • Sometimes on very long jobs it is worthwhile to be able to stop the job and restart it later. • What are the issues involved here? • Why use it? • Starter, suspend, resume, terminate methods

  20. Hard & Soft Requirements • A hard requirement must be present before the job is scheduled

More Related