1 / 20

Maui

Maui. Batch System. Batch server and cluster Configuration, Job queue, State table. Scheduler and additional cluster Configuration. qsub,qdel,qstat. Batch Server (pbsserver). Job, start, stop, status. Node, job, start, stop. Scheduler plug-in. Job, start, stop, status. status.

raquel
Télécharger la présentation

Maui

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maui HepSysMan 1St/2nd July 2004

  2. Batch System Batch server and cluster Configuration, Job queue, State table Scheduler and additional cluster Configuration qsub,qdel,qstat Batch Server (pbsserver) Job, start, stop, status Node,job,start,stop Scheduler plug-in Job, start, stop, status status Execution host (pbsmom Execution host (pbsmom Execution host (pbsmom) Execution host (pbsmom) HepSysMan 1St/2nd July 2004

  3. Maui scheduler • Seems to originate at Maui High Performance Computing Centre (MHPCC)http://www.mhpcc.edu • But now available from http://www.supercluster.org/maui/in Covered Bridge Canyon, Utah HepSysMan 1St/2nd July 2004

  4. Maui/PBS Integration [martin@masternode martin]$ qmgr Max open servers: 4 Qmgr: list server Server masternode server_state = Idle scheduling = False default_queue = dque log_events = 127 mail_from = adm query_other_jobs = True resources_default.walltime = 00:01:00 scheduler_iteration = 60 node_pack = False pbs_version = OpenPBS_2.4 # maui.cfg 3.2 # # 18/5/04 built by maui with extras added by xCAT and the 12Mar04 version # SERVERHOST masternode # primary admin must be first in list ADMIN1 root RMCFG[base] TYPE=PBS RMPOLLINTERVAL 00:01:00 SERVERPORT 42559 SERVERMODE NORMAL HepSysMan 1St/2nd July 2004

  5. Maui Philosophy (1) • Maui is particularly concerned about scheduling multiprocessor jobs • How do you arrange a matching set of processors to be simultaneously available for a single job ? • Maui tries to plan the execution of such jobs at a particular time when it expects sufficient processors to be available - on the basis of the job maximum walltime parameters. • It establishes reservations on a set of processors for a job – ensuring all the processors are free at the planned time HepSysMan 1St/2nd July 2004

  6. Reservations walltime cpu HepSysMan 1St/2nd July 2004

  7. Maui Philosophy(2) • As the reservations take effect, more and more processors become idle as the planned job time approaches • A scheme called backfill tries to exploit these idle processors by running short single/few processor jobs out of priority order in the gaps • Maximum efficiency is achieved by scheduling big jobs first and running small jobs in the gaps ! perhaps not what the users really want ? • Maui really cares about walltimes HepSysMan 1St/2nd July 2004

  8. Job Priority (1) • Jobs are selected for execution in priority order • Priority is calculated as a linear combination of factors based on • Credentials – who, class/queue,.. • Fair Share • Resources requested • Waiting time • Target Service level – eg maximum wait • Most sites would have most coefficients set to 0 HepSysMan 1St/2nd July 2004

  9. Sample Priority Component • 5.1.2.2    Fairshare (FS) Component     Fairshare components allow a site to favor jobs based on short term historical usage.  The Fairshare Overview describes the configuration and use of Fairshare in detail. •    After the brief reprieve from complexity found in the QOS factor, we come to the Fairshare factor.  This factor is used to adjust a job's priority based on the historical percentage system utilization of the jobs user, group, account, or QOS.   This allows you to 'steer' the workload toward a particular usage mix across user, group, account, and QOS dimensions.  The fairshare priority factor calculation is •     Priority += FSWEIGHT * MIN(FSCAP, (         FSUSERWEIGHT     * DeltaUserFSUsage +         FSGROUPWEIGHT    * DeltaGroupFSUsage +         FSACCOUNTWEIGHT  * DeltaAccountFSUsage +         FSQOSWEIGHT      * DeltaQOSFSUsage +         FSCLASSWEIGHT    * DeltaClassFSUsage)) •     All '*WEIGHT' parameters above are specified on a per partition basis in the maui.cfg file.  The 'Delta*Usage' components represents the difference in actual fairshare usage from a fairshare usage target.  Actual fairshare usage is determined based on historical usage over the timeframe specified in the fairshare configuration.  The target usage can be either a target, floor, or ceiling value as specified in the fairshare config file.  The fairshare documentation covers this in detail but an example should help obfuscate things completely.  Consider the following information associated with calculating the fairshare factor for job X. HepSysMan 1St/2nd July 2004

  10. Job Priority (2) • Multiple queues/classes are but one factor in maui calculations and decisions • Jobs are normally given a whole cpu or even a whole execution host • Priorities are recalculated on every maui iteration – say 1 per minute • Jobs selected for backfill can bypass higher priority jobs HepSysMan 1St/2nd July 2004

  11. Fairness • Jobs can be given priority increments or decrements according to whether their user/group/…. ‘s recent usage is below or above target fairshare • There are a selection of throttling parameters to prevent various forms of excessive behaviour – max jobs, max submission rate,…. HepSysMan 1St/2nd July 2004

  12. Reservations • The administrator can set manual reservations – handy for shutting node down at particular time • Standing reservations repeat – eg ScotGRID-Glasgow reserves a few nodes for short jobs 08:00 – 20:00 every day. • Backfill allows a jobs of  12 hours on these nodes during the night HepSysMan 1St/2nd July 2004

  13. Node selection • Some heterogeneity in the cluster may require all processors for a job to come from some subset for best performance eg sharing a Myrinet switch. • Some constraints on node selection based on ownership may be demanded • Maui has additional cluster configuration settings that can define sets of execution hosts as partitions (simple member list)or as nodesets (set defined by common node feature) HepSysMan 1St/2nd July 2004

  14. Simulation • Maui has a scheme for recording a usage profile over some period – eg a week • The profile can then be played back with a different maui configuration in simulation mode to test new settings • Quite a few “under construction” sections in the manual about this HepSysMan 1St/2nd July 2004

  15. Resource Allocation Manager • “Payment” for usage • Maui can interwork with the QBank resource allocation manager • http://www.emsl.pnl.gov/docs/mscf/qbank/ • Pacific Northwest National Laboratory (PNNL) in Richland, Washington • Reserves payment before job (lien) and takes actual payment for resources used after the job • May be important when cluster is funded from many sources and value for money needs to be proved HepSysMan 1St/2nd July 2004

  16. ScotGRID-Glasgow Experience (1) • OpenPBS and maui built and configured by IBM’s eXtreme Cluster Administration Tool (xCAT) • http://www.xcat.org • xCAT is not a product – more a kit of parts supplied to IBM customers to operate Linux clusters – some Open Source • xCAT includes scripts to build OpenPBS and Maui according to the xCAT scheme • Fairshares used to balance between user groups • Calculated wrt an average over 7 days – decaying 20% per day • Most effective with a steady demand across all users/groups – less good when job submission is more peaks and troughs HepSysMan 1St/2nd July 2004

  17. ScotGRID-Glasgow Experience (2) • Standing reservation for short jobs during daytime • Currently 3 nodes with a maximum walltime of 1 hour • Intended for development/test runs • Grid monitoring test jobs • No experience yet of multiprocessor jobs, simulation, resource allocation management • Bioinfomatics group demonstrated that maui has a compiled limit of 4096 on the maximum number of jobs that can be in the queue ! HepSysMan 1St/2nd July 2004

  18. ScotGRID-Glasgow Experience(3) • Maui Documentation is extensive but not completely comprehensive • Maui is not keen on error messages • Priority calculation is hard to get to grips with • A misbehaving pbs_mom hangs both OpenPBS and Maui • ssh allnodes service pbs status • hope to use ganglia ( http://ganglia.sourceforge.net/ ) to spot cases where whole execution host in trouble • Ganglia’s gmetad (that aggregates local data) contributes a load average of ~1 on our 1 GHz PIII .. Looks like gmetad needs its own cpu HepSysMan 1St/2nd July 2004

  19. Grid(1) • The EDG (and LCG?) job submission system relies on sites giving an estimate of time before a job would start to execute – FIFO behaviour • Maui does not execute jobs in submission order – non FIFO behaviour • RB gets an unreliable estimate HepSysMan 1St/2nd July 2004

  20. Grid(2) • Gridpp have a Batch solution replacing OpenPBS with Torque and Maui – see words of Steve Traylen at • http://www.gridpp.ac.uk/tb-support/faq/torque.html • A Google search on Maui lcg rpmreveals many other sites getting into maui HepSysMan 1St/2nd July 2004

More Related