1 / 17

High Performance Computing Basics

High Performance Computing Basics. April 17, 2007 Dr. David J. Haglin. Outline. What is the HPC? Where did it come from? How can you get an account on hpc.mnsu.edu? How can you use it for your research? Where do you go from here?. What is the HPC?.

Télécharger la présentation

High Performance Computing Basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Computing Basics April 17, 2007 Dr. David J. Haglin

  2. Outline • What is the HPC? • Where did it come from? • How can you get an account on hpc.mnsu.edu? • How can you use it for your research? • Where do you go from here?

  3. What is the HPC? • Many AMD Opteron Computers (nodes) in a rack • Connected by a high-speed network • In the IT Services Secure area (third floor of the library) • All nodes run linux • http://www.mnsu.edu/hpc

  4. What is the HPC? • Head node has 8GB RAM; 7.4 TB of Disk • Head node is for doing administrative work and starting long jobs • The 34 Worker nodes are for doing long computations • Each worker has 8GB RAM; 80 GB Hard Disk; 2 dual-core AMD Opteron

  5. What is the HPC? • Software Installed: • GNU languages: C/C++ (gcc/g++), Fortran (gfortran) • Message Passing Interface library OpenMPI • Software soon to be installed: • MATLAB • Fluent • Portland Group Fortran and C/C++ • IMSL • Email is “local delivery only”

  6. Where did it come from? • National Science Foundation Grant • MRI Program (Major Research Instrumentation) • $140,000 • Institutional Equipment funds upgraded machine by adding five nodes • PIs: Patrick Tebbe, Rebecca Bates, David Haglin • Proposal focused on a college-wide need for HPC • Vendor: PSSC Labs, Inc.

  7. How can you get an account? • We must submit a final report to NSF after July 31, 2009 • Part of the final report must include how much it was used within CSET (and within MSU). • We need to track usage (research projects). • To get an account, send an email to haglin@mnsu.edu with information as described: • http://www.mnsu.edu/hpc/accounts.html • Your students can get accounts too! • We are very interested in knowing about publications you obtain as a result of using hpc.mnsu.edu.

  8. Your Research • Okay, so you got an account. • Now What?

  9. Your Research • Learning to use HPC. • Learning to use the OpenPBS/Torque job queuing software. • Learning to “design” your usage. • Tutorials will be maintained at www.mnsu.edu/hpc

  10. Your Research • Connect to hpc.mnsu.edu (head node) using ssh • ssh on unix • PuTTY or SSH Windows Client (IT Services) • Firewall is pretty tight, may need to request a new opening in the firewall from your location • Line-mode (command-line) interface • Basic unix commands: • http://www.mnsu.edu/hpc/tutorials/linux_basics.doc

  11. Your Research • Disks on hpc:

  12. Your Research • Using OpenPBS/Torque job queuing software: • qstat -- Inspect current job queue • qsub -- Add a new job to the queue • qdel -- Delete one of your jobs from the Q • pbsmon.py -- See the state of the entire machine • xpbsmon -- Uses X11 to display machine state • firefox localhost/ganglia • Detailed information available at: • http://www.clusterresources.com/torquedocs21/usersmanual.shtml

  13. Your Research • Designing your usage. • Assume you have a program you want to run for different parameter values of 1 through 1000 • Ex: $ myProgram -p1 $ myProgram -p2 . . $ myProgram -p1000

  14. Your Research • Create 1000 “start scripts” to queue 1000 jobs to the master queue. • Start your jobs and monitor their progress • Combine results when they are all done. • Organize experiments/runs in folders • Use scripting languages such as python to generate start scripts.

  15. Your Research • Input and Output for your jobs: • Your script will start on a worker node • You can log in to a worker node to see filesystem: • ssh n04 • df • Standard Output and Standard Error are separate • Files are written alongside your script when jobs completes • No way to monitor progress of your computation

  16. Your Research • Sample script to run from 501 to 505:

  17. Where do you go from here? • www.mnsu.edu/hpc is a communication portal • Find colleagues who can help • Learn more about the capabilities: • New software • Parallel programming (MPI) • Parallel libraries: e.g., ScaLAPACK. • Keep this machine computing fast • Other ideas?

More Related