1 / 21

Resource Management Systems

Resource Management Systems. NQE (Network Queue Environment). NQE. snow. ./prog.out. FTA: File Transfer Agent NQS: Networking Queueing System. NQE user commands. cevent  Posts, reads, and deletes job-dependency event information.

amalie
Télécharger la présentation

Resource Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Management Systems DCC/FCUP Grid Computing

  2. NQE (Network Queue Environment) DCC/FCUP Grid Computing

  3. NQE snow ./prog.out DCC/FCUP Grid Computing FTA: File Transfer Agent NQS: Networking Queueing System

  4. NQE user commands cevent  Posts, reads, and deletes job-dependency event information. cqdel  Deletes or signals to a specified batch request. cqstatl   Provides a line-mode display of requests and queues on a specified host cqsub  Submits a batch request to NQE. ftua  Transfers a file interactively (this command is issued on an NQE server only). ilb  Executes a load-balanced interactive command. nqe Provides a graphical user interface (GUI) to NQE functionality. Commands issued on an NQE server only: qalter  Alters the attributes of one or more NQS requests qchkpnt  Checkpoints an NQS request on a UNICOS, UNICOS/mk, or IRIX system qdel  Deletes or signals NQS requests qlimit  Displays NQS batch limits for the local host qmsg  Writes messages to stderr, stdout, or the job log file of an NQS batch request qping  Determines whether the local NQS daemon is running and responding to requests qstat  Displays the status of NQS queues, requests, and queue complexes qsub  Submits a batch request to NQS rft  Transfers a file in a batch request Fonte: http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=bks&fname=/SGI_Admin/NQE_AG/apa.html DCC/FCUP Grid Computing

  5. SGE (Sun Grid Engine) Um único recurso pode desempenhar Mais de uma atividade DCC/FCUP Grid Computing

  6. SGE • Commands similar to theonesusedby NQE • Example: g.job #!/bin/csh gaussian < testDFT.in • To run: qsub –pe smp 4 –M ines@dcc.fc.up.pt –m ae –r n g.job Or... DCC/FCUP Grid Computing

  7. SGE • File g.job #!/bin/csh #$ -pe smp 4 # parallel environment #$ -M ines@dcc.c.up.pt #$ -m ae # mail sent at end/abort #$ -r n # no rerun gaussian < testDFT.in • To run: qsub g.job DCC/FCUP Grid Computing

  8. SGE: other example #$ -pe openmpi* 32 #$ -q short* #$ -l dedicated=4 DCC/FCUP Grid Computing

  9. SGE: another example DCC/FCUP Grid Computing

  10. SGE • User can specify requirements (cpu type, disk storage, memory etc) • SGE regisers the task, requirements and control information (owner, group, dept, date/time of submission etc) • Planner to execute tasks • As soon as a resource queue is available, SGE launches the execution of one of the waiting tasks • Task with greater priority or greater waiting time, according to the configuration of the planner • If there are several available queues, choose the least loaded • There can exist more than one queue per cluster DCC/FCUP Grid Computing

  11. SGE • Planning policies: • Based in tickets (User) • The more tickets a user has, the greater its priority • Tickets are assigned statically according to the queueing policy and priorities assigned to each user • Based in urgency (tasks) • Time limit to finish the task (can be specified by the user) • Waiting queue time • Required resources • Customized: allows arbitrary assignment of priorities to the tasks (similar to nice) DCC/FCUP Grid Computing

  12. SGE • Lyfe cycle of a task: • Submission • Master stores the task and informs the planner • Planner inserts the task in the appropriate queue • Master sends task to host • Before executing the execution daemon: • Changes to the directory of the task • Initializes the environment (env variables) • Initializes the set of processors • Changes the task uid to the pwner’s uid • Initializes resource limits to the process • Collects accounting info • When it finishes these steps, stores the task in its database and wait for it to finish • Once the task is finished, informs the master and deletes the task from the database DCC/FCUP Grid Computing

  13. SGE • Some commands: • qconf: cluster configuration • qsub: task submission • qdel: delete tasks from the queue • qacct: accounting information • qhost: inspects hosts status • qstat: inspects queues status DCC/FCUP Grid Computing

  14. SGE • GUI DCC/FCUP Grid Computing

  15. SGE GUI DCC/FCUP Grid Computing

  16. Condor • It is a specialized job and resource management system. It provides: • Job management mechanism • Scheduling • Priority scheme • Resource monitoring • Resource management DCC/FCUP Grid Computing

  17. Condor • The user submits a job to an agent. • The agent is responsible for remembering jobs in a persistent storage while finding resources willing to run them. • Agents and resources advertise themselves to a matchmaker, which is responsible for introducing potentially compatible agents and resources. • At the agent, a shadow is responsible for providing all the details necessary to execute a job. • At the resource, a sandbox is responsible for creating a safe execution environment for the job and protecting the resource from any mischief. DCC/FCUP Grid Computing

  18. Condor Matchmaker ClassAds Plan of jobs job User Problem Solver Agent Resource claim Shadow Sandbox Details of the job Environment Job DCC/FCUP Grid Computing

  19. Condor: Gateway Flocking - Gateway passes information about participants between pools, - M(A) sends request to M(B) through gateways, - M(B) returns a match DCC/FCUP Grid Computing

  20. CondorDirect Flocking A also advertises to Condor Pool B DCC/FCUP Grid Computing

  21. RMS • Each has its own interface • Do not provide integration • Lack of interoperability • Requires specific administrative skills • Increase operational costs • Generate over-provisioning and global load imbalance DCC/FCUP Grid Computing

More Related