1 / 41

Introduction to Scientific Computing on Linux Clusters

Introduction to Scientific Computing on Linux Clusters. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002. Outline. Why Clusters? Parallelization example - Game of Life performance metrics Ways to Fool the Masses summary. Doug Sondak

veata
Télécharger la présentation

Introduction to Scientific Computing on Linux Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  2. Outline • Why Clusters? • Parallelization • example - Game of Life • performance metrics • Ways to Fool the Masses • summary Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  3. Why Clusters? • Scientific computing has traditionionally been performed on fast, specialized machines • Buzzword - Commodity Computing • clustering cheap, off-the-shelf processors • can achieve good performance at a low cost if the applications scale well Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  4. Clusters (2) • 102 clusters in current Top 500 list http://www.top500.org/list/2001/06/ • Resonable parallel efficiency is the key • generally use message passing, even if there are shared-memory CPU’s in each box Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  5. Compilers • Linux Fortran compilers (F90/95) • available from many vendors, e.g., Absoft, Compaq, Intel, Lahey, NAG, Portland Group, Salford • g77 is free, but is restricted to Fortran 77, relatively slow Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  6. Compilers (2) • Intel offers free unsupported Fortran compiler for non-commercial purposes • full F95 • OpenMP http://www.intel.com/software/products/ compilers/f60l/noncom.htm Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  7. Compilers (3) http://www.polyhedron.com/

  8. Compilers (4) • Linux C/C++ compilers • gcc/g++ seems to be the standard, usually described as a good compiler • also available from vendors, e.g., Compaq, Intel, Portland Group Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  9. Parallelization of Scientific Codes Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  10. Domain Decomposition • Typically perform operations on arrays • e.g., setting up and solving system of equations • domain decomposition • arrays are broken into chunks, and each chunk is handled by a separate processor • processors operate simultaneously on their own chunks of the array Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  11. Other Methods • Parallelzation also possible without domain decomposition • less common • e.g., process one set of inputs while reading another set of inputs from a file Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  12. Embarrassingly Parallel • if operations are completely independent of one another, this is called embarrassingly parallel • e.g., initializing an array • some Monte Carlo simulations • not usually the case Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  13. Game of Life • Early simple cellular automata • created by John Conway • 2-D grid of cells • each has one of 2 states (“alive” or “dead”) • cells are initialized with some distribution of alive and dead states Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  14. Game of Life (2) • at each time step states are modified based on states of adjacent cells (including diagonals) • Rules of the game: • 3 alive neighbors - alive • 2 alive neighbors - no change • other - dead Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  15. Game of Life (3) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  16. Game of Life (4) • Parallelize on 2 processors • assign block of columns to each processor • Problem - What happens at split?

  17. Game of Life (5) • Solution - Overlap cells • Each time step, pass overlap data processor to processor

  18. Message Passing • Largest bottleneck to good parallel efficiency is usually message passing • much slower than number crunching • set up your algorithm to minimize message passing • minimize surface-to-volume ratio of subdomains Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  19. Domain Decomp. For this domain: To run on 2 processors, decompose like this: Not like this: Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  20. How to Pass Msgs. • MPI is the recommended method • PVM may also be used • MPICH • most common • free download http://www-unix.mcs.anl.gov/mpi/mpich/ • others also avalable, e.g., LAM Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  21. How to Pass Msgs. • some MPI tutorials • Boston University http://scv.bu.edu/Tutorials/MPI/ • NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  22. Performance Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  23. Code Timing • How well has code been parallelized? • CPU time vs. wallclock time • both are seen in literature • I prefer wallclock • only for dedicated processors • CPU time doesn’t account for load imbalance • unix time command • Fortran system_clock subroutine • MPI_Wtime Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  24. Parallel Speedup • quantify how well we have parallelized our code Sn = parallel speedup n = number of processors T1 = time on 1 processor Tn = time on n processors

  25. Parallel Speedup (2)

  26. Parallel Efficiency hn = parallel efficiency T1 = time on 1 processor Tn = time on n processors n = number of processors Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  27. Parallel Efficiency (2)

  28. Parallel Efficiency (3) • What is a “reasonable” level of parallel efficiency? • Depends on • how much CPU time you have available • when the paper is due • can think of (1-h) as “wasted” CPU time • my personal rule of thumb ~60% Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  29. Parallel Efficiency (4) • Superlinear speedup • parallel efficiency > 1.0 • sometimes quoted in the literature • generally attributed to cache issues • subdomains fit entirely in cache, entire domain does not • this is very problem dependent • be suspicious! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  30. Amdahl’s Law • Always some operations which are performed serially • want a large fraction of code to execute in parallel Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  31. Amdahl’s Law (2) • Let fraction of code that executes serially be denoted s • Let fraction of code that executes in parallel be denoted p Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  32. Amdahl’s Law (3) • Noting that p = (1-s) The parallel speedup is Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  33. Amdahl’s Law (4) The parallel efficiency is Alternate version of Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  34. Amdahl’s Law (5)

  35. Amdahl’s Law (6) • Should we despair? • No! • bigger machines solve bigger problems smaller value of s • if you want to run on a large number of processors, try to minimize s Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  36. Ways to Fool the Masses • full title: “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers” • Created by David Bailey of NASA Ames in 1991 • following is selection of “ways,” some paraphrased Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  37. Ways to Fool (2) • Scale problem size with number of processors • Project results linearly • 2 proc, 1 hr. 1800 proc., 1 sec. • Present performance of kernel, represent as performance of application Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  38. Ways to Fool (3) • Compare with old code on obsolete system • Quote MFLOPS based on parallel implementation, not best serial implementation • increase no. operations rather than decreasing time Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  39. Ways to Fool (4) • Quote parallel speedup making sure single-processor version is slow • Mutilate the algorithm used in the parallel implementation to match the architecture • explicit vs. implicit PDE solvers • Measure parallel times on dedicated system, serial times in busy environment Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  40. Ways to Fool (5) • If all else fails, show pretty pictures and animated videos, and don’t talk about performance. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

  41. Summary • Clusters are viable platforms for relatively low-cost scientific computing • parallel considerations similar to other platforms • MPI is a free, effective message passing API • careful with performance timings Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

More Related