High Performance Computing Basics April 17, 2007 Dr. David J. Haglin
Outline • What is the HPC? • Where did it come from? • How can you get an account on hpc.mnsu.edu? • How can you use it for your research? • Where do you go from here?
What is the HPC? • Many AMD Opteron Computers (nodes) in a rack • Connected by a high-speed network • In the IT Services Secure area (third floor of the library) • All nodes run linux • http://www.mnsu.edu/hpc
What is the HPC? • Head node has 8GB RAM; 7.4 TB of Disk • Head node is for doing administrative work and starting long jobs • The 34 Worker nodes are for doing long computations • Each worker has 8GB RAM; 80 GB Hard Disk; 2 dual-core AMD Opteron
What is the HPC? • Software Installed: • GNU languages: C/C++ (gcc/g++), Fortran (gfortran) • Message Passing Interface library OpenMPI • Software soon to be installed: • MATLAB • Fluent • Portland Group Fortran and C/C++ • IMSL • Email is “local delivery only”
Where did it come from? • National Science Foundation Grant • MRI Program (Major Research Instrumentation) • $140,000 • Institutional Equipment funds upgraded machine by adding five nodes • PIs: Patrick Tebbe, Rebecca Bates, David Haglin • Proposal focused on a college-wide need for HPC • Vendor: PSSC Labs, Inc.
How can you get an account? • We must submit a final report to NSF after July 31, 2009 • Part of the final report must include how much it was used within CSET (and within MSU). • We need to track usage (research projects). • To get an account, send an email to email@example.com with information as described: • http://www.mnsu.edu/hpc/accounts.html • Your students can get accounts too! • We are very interested in knowing about publications you obtain as a result of using hpc.mnsu.edu.
Your Research • Okay, so you got an account. • Now What?
Your Research • Learning to use HPC. • Learning to use the OpenPBS/Torque job queuing software. • Learning to “design” your usage. • Tutorials will be maintained at www.mnsu.edu/hpc
Your Research • Connect to hpc.mnsu.edu (head node) using ssh • ssh on unix • PuTTY or SSH Windows Client (IT Services) • Firewall is pretty tight, may need to request a new opening in the firewall from your location • Line-mode (command-line) interface • Basic unix commands: • http://www.mnsu.edu/hpc/tutorials/linux_basics.doc
Your Research • Disks on hpc:
Your Research • Using OpenPBS/Torque job queuing software: • qstat -- Inspect current job queue • qsub -- Add a new job to the queue • qdel -- Delete one of your jobs from the Q • pbsmon.py -- See the state of the entire machine • xpbsmon -- Uses X11 to display machine state • firefox localhost/ganglia • Detailed information available at: • http://www.clusterresources.com/torquedocs21/usersmanual.shtml
Your Research • Designing your usage. • Assume you have a program you want to run for different parameter values of 1 through 1000 • Ex: $ myProgram -p1 $ myProgram -p2 . . $ myProgram -p1000
Your Research • Create 1000 “start scripts” to queue 1000 jobs to the master queue. • Start your jobs and monitor their progress • Combine results when they are all done. • Organize experiments/runs in folders • Use scripting languages such as python to generate start scripts.
Your Research • Input and Output for your jobs: • Your script will start on a worker node • You can log in to a worker node to see filesystem: • ssh n04 • df • Standard Output and Standard Error are separate • Files are written alongside your script when jobs completes • No way to monitor progress of your computation
Your Research • Sample script to run from 501 to 505:
Where do you go from here? • www.mnsu.edu/hpc is a communication portal • Find colleagues who can help • Learn more about the capabilities: • New software • Parallel programming (MPI) • Parallel libraries: e.g., ScaLAPACK. • Keep this machine computing fast • Other ideas?