1 / 26

SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam. Bioteam Inc. Independent Consulting Shop Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT Many years of industry & academic experience

tyne
Télécharger la présentation

SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SGE TrainingNASA LaRC ASDCDelivered May 5,6,7 2009Chris DwanBioteam cdwan@bioteam.net

  2. Bioteam Inc. • Independent Consulting Shop • Vendor/technology agnostic • Staffed by: • Scientists forced to learn High Performance IT • Many years of industry & academic experience • Our specialty: Bridging the gap between Science & IT cdwan@bioteam.net

  3. Session Goals • Introduce ASDC systems • Detailed introduction to the IBM system • Deliver Sun Grid Engine Training • Encourage follow up cdwan@bioteam.net

  4. Interactive / Small Group Goals • 1 - 2 hours • 1 – 5 people • Users log into systems. • Users type examples, run jobs. • If code is available, bring it. • If specific use cases exist, bring them. cdwan@bioteam.net

  5. Selected ASDC Systems cdwan@bioteam.net

  6. Selected ASDC Systems • Apple Cluster • Online and in use at SCF since 2007 • ~40 dual processor OS X systems (80+ CPUs) • Access through manila and corregidor • Magneto • ~28 quad core linux servers (100+ CPUs) • Online and in production use since 2006 • New Magneto (ORR May 15) • Large, mixed purpose Linux cluster / file store • 176 CPUs dedicated to SCF • 576 CPUs dedicated to production • Disk based archive: 1.1PB cdwan@bioteam.net

  7. Apple Cluster • Access: • LDAP account • manila or corregidor cdwan@bioteam.net

  8. NASA LaRC Science Directorate • Picture taken 9/2/08 • 1.2PB usable space • Fibre connected (384+ fibre ports) • 2,560 individual disk drives • 16 disks per chassis • 10 chassis per rack • 16 racks of disks • IBM Linux servers, mixed P6 and x86 CPUs to support legacy codes • Filesystem: IBM GPFS cdwan@bioteam.net

  9. Operational Readiness Review Mid May 2009 Stay Tuned cdwan@bioteam.net

  10. cdwan@bioteam.net

  11. cdwan@bioteam.net

  12. cdwan@bioteam.net

  13. Interactive hosts • bc201: instrument1-blue • bc202: instrument2-blue • bc203: erbe-blue • bc204: tisa1-blue • bc205: srb1-blue • bc206: srb2-blue • bc207: power1-blue • bc208: power2-blue • bc209: sarba-blue • bc210: consodine-blue • bc211: sofa-blue • bc212: cloudsa-blue • bc213: cloudsb-blue • bc214: inversion-blue cdwan@bioteam.net

  14. Sun Grid Engine Technical Introduction cdwan@bioteam.net

  15. Most “grids” look like this on paper… Dedicated File services Portal node(s) Local Area Network Private Network Compute Nodes info@bioteam.net

  16. … and in reality: info@bioteam.net

  17. … and in reality: info@bioteam.net

  18. … and in reality: info@bioteam.net

  19. Sun Grid Engine History http://blogs.sun.com/templedf/entry/a_little_history_lesson • 1996: • Codine 4.02 • Grid Resource Director (GRD) 1.0 • 2000: • SGE 5.2. Sun acquires Gridware Inc. • 2001: • SGE 5.3. Sun releases source code • Last version called GRD • 2004: • SGE(EE) vs. SGE N1GE vs. SGE cdwan@bioteam.net

  20. Sun Grid Engine References • http://gridengine.sunsource.net/ • Generally, the user manuals are awful • http://gridengine.info/ • Very useful blog run by Chris Dagdigian • My slides / examples are going to be online in-house. • Deep, in house expertise. cdwan@bioteam.net

  21. Compute Farm Logical View Distributed Resource Manager User 1 User N Cluster Network info@bioteam.net

  22. Grid Engine does the following: • Accept work requests (jobs) from users • Puts jobs in a pending area • Sends jobs from the pending area to the best available machine • Manages the job while it runs • Returns results, logs accounting data when the job is finished info@bioteam.net

  23. Huh? • What you need to know: • Don’t worry about queues or specific machines. All you need to do when submitting a job is describe the resources your job will need to run successfully. • Grid Engine will take care of the rest • The ‘default’ settings are good enough for most cases info@bioteam.net

  24. Most useful SGE commands • qsub / qdel • Submit jobs & delete jobs • qstat & qhost • Status info for queues, hosts and jobs • qacct • Summary info and reports on completed job • qrsh • Get an interactive shell on a cluster node • Quickly run a command on a remote host • qmon • Launch the X11 GUI interface info@bioteam.net

  25. Examples cdwan@bioteam.net

  26. Live Examples • Single job • Single job with resource requirements • Job dependency • Task array job • Demand a whole compute node • Consumable resources cdwan@bioteam.net

More Related