1 / 27

Workload Management PBS Professional

Since 1999. Workload Management PBS Professional. We measure our success in terms of our customers’ business performance. High Performance Computing. Finite Compute Resources Large financial investment User Competition for Resources Types of Jobs Computation too large for desktops

judah
Télécharger la présentation

Workload Management PBS Professional

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Since 1999 Workload Management PBS Professional We measure our success in terms ofour customers’ business performance.

  2. High Performance Computing • Finite Compute Resources • Large financial investment • User Competition for Resources • Types of Jobs • Computation too large for desktops • Long running Jobs • Organized by Asking- • Are you done yet? • What exactly are you using? • What about my job? Job 1 Unused Resources Unused Resources Running Job Job 2 CPUs Job 3 Unused Resources Unused Resources Time

  3. Altair Engineering A global software and services company focused on information analysis, visualization, product development and high performance computing Founded in 1985 Privately held ~ 1,600 employees 40 offices: N.A., Europe, Asia/Pacific Headquarters: Troy, MI USA Enterprise CAE Software Suite On-demand Computing Technology Next-generation Lighting Technology Business Intelligence Solutions Product Innovation Consulting

  4. Life with PBS Professional • Focus on science, not computers • Easy to use • Enables business continuity, bulletproof reliability and runs everywhere • Hard to break • Automatically ensure the right work is done at the right time and eliminate waste          • Do more (with less) • Maximize ROI                                                            •  Keep track and plan Job 6 Job 2 Job 1 Running Job Job 5 CPUs Job 3 Time

  5. v10.1 PBS Professional has advanced scheduling functionality and the reliability, availability and serviceability features to satisfy production enterprise customers with massive computational infrastructures. Continued product/customer focus on technical HPC – the de facto workload management standard for Linux clusters and the largest, most complex workloads. • Node virtualization • Green computing • Central licensing server • Analytics and other web-based extras • Job arrays • Enhanced network debugging • Integration with most MPI packages* • Restrict user/node lock down* • Individual job “sand boxes”* • PBS Professional server/scheduler failover • Automatic job migration upon failures • Topology aware scheduling • Scheduler enhancements* • Tunable formula • Job submission hooks • Standing reservations • Backfill around 1-N large jobs • Enhanced fair share • Eligible time • Meta-scheduling such as Windows HPC Server • Generic dynamic provisioning, finer control, address “black hole”

  6. Tunable Job Scheduling Formula • Extend the scheduler's tunable formula to include additional mathematical operations; custom resources currently are viewable and request able/modifiable by all users, operators, and managers. • This formula may be in in conjunction with the standard PBS Professional scheduler. • Example: Company A would like to create per-job coefficients in their formula which are set by system defaults and not able to be requested/modified or viewed by the user. • For example A, B, C and D below would be these coefficient resources. • A *(Queue Priority) + B*(Job Class Priority) + C*(CPUs) + D*(Queue Wait Time)

  7. Tunable Job Scheduling Formula (more)Customize job scheduling to whatever PERL allows Define any policy – including on-the-fly “exceptions” • Simple formulas are very simple (big jobs go first) • ncpus * (walltime/3600.0) • Complex formulas are pretty simple too… (adds priority accrual for smaller jobs, high-priority queue, deferred queue, “run this job next”) • (ncpus * (walltime/3600.0)) * Wsize + • (eligible_time/3600.0) * Wwait + • special_p

  8. Standing Reservations Guarantee resources for recurring needs pbs_rsub -R 0500 -E 0800 \ –r "FREQ=DAILY;BYDAY=MO,TU,WE,TH,FR;UNTIL=20091231" \ -l select=200:ncpus=2 –l place=scatter:excl • Run the weather simulations from 5-8am every weekday morning • Reserve the computing lab for classes on MWF 14:00-16:00 • Block out time for maintenance the first weekend of every month

  9. Submission Filtering “Hooks” Change/augment capabilities in the field, on-the-fly, without source if e.job.Resource_List["walltime"] is None: e.reject(“You must specify a walltime") • Admission control – validate requests • Allocation management • On-the-fly tuning • Custom logging, reporting, debugging, and even patches!

  10. Scheduling & Policy Enhancements • Strict FIFO • Inefficient scheduling algorithm • Not ideal for large & small job mix • Introduces utilization inefficiencies Used CPUs Job 1 Unused Resources Running Job Wait Time Job 3 Job 2 Now Time

  11. Scheduling & Policy Enhancements • Strict Ordering With Backfilling • Jobs Are Sorted & Order Is Maintained • Jobs Small Enough Not To Effect Start Times Are Backfilled • ‘Gaps’ Are Utilized Effectively • Large/Starving Job Starts When Expected Used CPUs Unused Resources Job 1 Running Job Job 3 Job 2 Now Time

  12. Tightly Integrated MPI Packages • Intel MPI on Linux • MPICH 1.2.5, 1.2.6 on Linux 2.4 on Itanium 2, x86/AMD64/EM64T • MPICH 1.2.5, 1.2.6 on Linux 2.6 on x86/AMD64/EM64T • MPICH-GM/MX on Linux • MPICH2 on Linux • IBM POE on AIX 5.2 • IBM POE on HPS Switch (enhanced) • IBM HPS via LoadLeveler • HP MPI 1.08.03 on HP-UX 11 on PA-RISC & Itanium 2 • HP MPI 2.0.0 on Linux 2.4 & 2.6 on x86/AMD64/EM64T/Itanium 2 • LAM/MPI 6.5.9/7.0.6/7.1.1 on Linux 2.4/2.6 on x86/AMD64/EM64T/Itanium 2 • SGI MPI on Linux on IA64 (Altix / Itanium 2) • SGI MPI (MPT) over Infiniband • SGI MPI within and across Altix(es) • SCALI MPI-Connect • Open MPI (native) • MVAPICH across Infiniband

  13. Restrict user-feature PBS Head Node PBS Professional provides guaranteed exclusive access to compute nodes and for administrators to ensure end users can not access nodes when jobs are not running

  14. PBS Per-Job Execution Directory – “Sandbox” Before PBS Professional v9.2 Job 29 Job 61 Job 75 Job 88 Job 73 Job 54 Job 42

  15. PBS Per-Job Execution Directory – “Sandbox” PBS Professional 9.2+ sandbox=PRIVATE Job 21 Job 42 Job 97 Job 38

  16. PBS Per-Job Execution Directory – “Sandbox” PBS creates a job specific staging and execution directory per running job (i.e. a job private sandbox) Each job runs in that sandbox Files are staged in/out of the sandbox area Upon successful stage-out the job specific directory is deleted However, if stage out fails, the job specific directory and its contents will not be deleted. Thus allowing job results to still be retrieved. sandbox=PRIVATE • No longer need a user’s home directory on every execution node • Simply set $jobdir_root in the mom’s config file • Able to run simultaneous instances of the same job • Jobs using common names for input/output files won’t overwrite another job’s files.

  17. PBS Pro Global resellers and Strategic Partners

  18. TM Contact Information • Support: • Level 1 Support • Troy, MI, 12 hour weekdays • Eastern Time Zone • 0800-2000 or 8am to 8pm • Support Phone: • (248) 614 2425 • Email: pbssupport@altair.com • Altair Engineering: • (248) 614 2400 • Website: • http://www.altair.com • http://www.pbspro.com • Devin Jensen • Director, Business Development • Desk : (801) 653-2300 • Cell: (949) 322-6212 • Email: jensen@altair.com • Stephen Gombosi • Senior Analyst • Desk: (303) 415-0327 • Cell: (303) 325-4336 • Email: sgombosi@altair.com

  19. TM Questions & Answers Thank You! Next Steps? • Evaluation licenses • Technical questions • Case studies

  20. Summary of PBS Professional – Part I • Node virtualization everything is a resource, DMP or SMP • Pull-based grid support reserves control in peer-to-peer environment • Restrict user/node lock down code “hammer,” extremely configurable • ACL (Access Control List)limits resource usage on a per user, group, host, or network domain level • Soft Limits/Hard Limits set per user • Node Packingminimizes the fewest number of hosts • Resource Minimum/Maximum are set at server and queue levels • Queue Routing allows redirection of jobs to and from other queues. • Prime time/Non-prime time scheduling allows adjusting of set policies • Individual controls allows users to determine account mapping and advance reservations • Administrationeasily configure queues, hosts, access lists, and resources • Enhanced network debugging set timeouts, frequencies, etc. • Licenses checked out by execution server, not the submission host; also keep expensive SW packages in use 24 x 7

  21. Summary of PBS Professional – Part II • Job dependencies • Job preemption and suspend/resume/checkpoint (per OS or application) • Advance reservations/standing reservations • Dynamic resources (per node and per server) • Idle time detection • Job arrays • Fairshare or round robin scheduling • Strict ordering/optimal backfilling • Interactive or batch mode processing • Topology-aware scheduling • Command line interface or graphical user interface (GUI) • Parallel jobs • Dedicated time • Finer granularity control on users/groups • Meta scheduling to Windows HPC • Job execution “sandboxes” • File staging with wildcards • Heterogeneous architecture (UNIX, Linux, and Windows) • Decade + of development • Source-code access

  22. GridWorks AnalyticsPBS Professional Accounting Companion Product • Generate reports automatically “out-of-the-box”, then customize... • Understand usage trends for capacity planning • Verify project planning assumptions to meet deadlines • Extract accounting data for chargeback • Track software license use to optimize ROI

  23. Directly visualize PBS Professional accounting data Numerous graph types Secure, role-based access: users view their data; managers view group data Customize, save, and share reports or entire dashboards Drill-down to underlying data Publish via web and export to Excel GridWorks Analytics Features

  24. GridWorks Analytics Architecture • Modern drag-and-drop Web 2.0 technology • Turn-key setup from download to display • Aggregates data from multiple PBS Professional servers • Historical data back as far as PBS Pro 5.3.x • Plugs-in to all major enterprise infrastructure

  25. Desktop tool that turns your multi-core desktop into your own miniature compute farm… Run HPC jobs locally for free Drag-and-drop jobs to back-end PBS Professional systems Jobs Pane Applications Pane What Is Personal PBS? Toolbar Profiles Pane

  26. Drag-and-drop input to submit Recognizes application and fills in default job parameters Add new applications via wizard Monitor, manage, prioritize jobs Run jobs locally on your desktop Connect to enterprise clusters Output is automatically returned Enterprise PBS Professional Cluster Personal PBS Overview My Jobs Job List Queues

More Related