Introduction to Scientific Computing on Linux Clusters

Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Outline • Why Clusters? • Parallelization • example - Game of Life • performance metrics • Ways to Fool the Masses • summary Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Why Clusters? • Scientific computing has traditionionally been performed on fast, specialized machines • Buzzword - Commodity Computing • clustering cheap, off-the-shelf processors • can achieve good performance at a low cost if the applications scale well Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Clusters (2) • 102 clusters in current Top 500 list http://www.top500.org/list/2001/06/ • Resonable parallel efficiency is the key • generally use message passing, even if there are shared-memory CPU’s in each box Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers • Linux Fortran compilers (F90/95) • available from many vendors, e.g., Absoft, Compaq, Intel, Lahey, NAG, Portland Group, Salford • g77 is free, but is restricted to Fortran 77, relatively slow Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (2) • Intel offers free unsupported Fortran compiler for non-commercial purposes • full F95 • OpenMP http://www.intel.com/software/products/ compilers/f60l/noncom.htm Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Compilers (3) http://www.polyhedron.com/

Compilers (4) • Linux C/C++ compilers • gcc/g++ seems to be the standard, usually described as a good compiler • also available from vendors, e.g., Compaq, Intel, Portland Group Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallelization of Scientific Codes Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Domain Decomposition • Typically perform operations on arrays • e.g., setting up and solving system of equations • domain decomposition • arrays are broken into chunks, and each chunk is handled by a separate processor • processors operate simultaneously on their own chunks of the array Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Other Methods • Parallelzation also possible without domain decomposition • less common • e.g., process one set of inputs while reading another set of inputs from a file Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Embarrassingly Parallel • if operations are completely independent of one another, this is called embarrassingly parallel • e.g., initializing an array • some Monte Carlo simulations • not usually the case Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Game of Life • Early simple cellular automata • created by John Conway • 2-D grid of cells • each has one of 2 states (“alive” or “dead”) • cells are initialized with some distribution of alive and dead states Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Game of Life (2) • at each time step states are modified based on states of adjacent cells (including diagonals) • Rules of the game: • 3 alive neighbors - alive • 2 alive neighbors - no change • other - dead Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Game of Life (3) Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Game of Life (4) • Parallelize on 2 processors • assign block of columns to each processor • Problem - What happens at split?

Game of Life (5) • Solution - Overlap cells • Each time step, pass overlap data processor to processor

Message Passing • Largest bottleneck to good parallel efficiency is usually message passing • much slower than number crunching • set up your algorithm to minimize message passing • minimize surface-to-volume ratio of subdomains Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Domain Decomp. For this domain: To run on 2 processors, decompose like this: Not like this: Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

How to Pass Msgs. • MPI is the recommended method • PVM may also be used • MPICH • most common • free download http://www-unix.mcs.anl.gov/mpi/mpich/ • others also avalable, e.g., LAM Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

How to Pass Msgs. • some MPI tutorials • Boston University http://scv.bu.edu/Tutorials/MPI/ • NCSA http://pacont.ncsa.uiuc.edu:8900/public/MPI/ Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Performance Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Code Timing • How well has code been parallelized? • CPU time vs. wallclock time • both are seen in literature • I prefer wallclock • only for dedicated processors • CPU time doesn’t account for load imbalance • unix time command • Fortran system_clock subroutine • MPI_Wtime Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallel Speedup • quantify how well we have parallelized our code Sn = parallel speedup n = number of processors T1 = time on 1 processor Tn = time on n processors

Parallel Speedup (2)

Parallel Efficiency hn = parallel efficiency T1 = time on 1 processor Tn = time on n processors n = number of processors Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallel Efficiency (2)

Parallel Efficiency (3) • What is a “reasonable” level of parallel efficiency? • Depends on • how much CPU time you have available • when the paper is due • can think of (1-h) as “wasted” CPU time • my personal rule of thumb ~60% Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Parallel Efficiency (4) • Superlinear speedup • parallel efficiency > 1.0 • sometimes quoted in the literature • generally attributed to cache issues • subdomains fit entirely in cache, entire domain does not • this is very problem dependent • be suspicious! Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Amdahl’s Law • Always some operations which are performed serially • want a large fraction of code to execute in parallel Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Amdahl’s Law (2) • Let fraction of code that executes serially be denoted s • Let fraction of code that executes in parallel be denoted p Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Amdahl’s Law (3) • Noting that p = (1-s) The parallel speedup is Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Amdahl’s Law (4) The parallel efficiency is Alternate version of Amdahl’s Law Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Amdahl’s Law (5)

Amdahl’s Law (6) • Should we despair? • No! • bigger machines solve bigger problems smaller value of s • if you want to run on a large number of processors, try to minimize s Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Ways to Fool the Masses • full title: “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers” • Created by David Bailey of NASA Ames in 1991 • following is selection of “ways,” some paraphrased Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Ways to Fool (2) • Scale problem size with number of processors • Project results linearly • 2 proc, 1 hr. 1800 proc., 1 sec. • Present performance of kernel, represent as performance of application Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Ways to Fool (3) • Compare with old code on obsolete system • Quote MFLOPS based on parallel implementation, not best serial implementation • increase no. operations rather than decreasing time Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Ways to Fool (4) • Quote parallel speedup making sure single-processor version is slow • Mutilate the algorithm used in the parallel implementation to match the architecture • explicit vs. implicit PDE solvers • Measure parallel times on dedicated system, serial times in busy environment Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Ways to Fool (5) • If all else fails, show pretty pictures and animated videos, and don’t talk about performance. Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Summary • Clusters are viable platforms for relatively low-cost scientific computing • parallel considerations similar to other platforms • MPI is a free, effective message passing API • careful with performance timings Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002

Introduction to Scientific Computing on Linux Clusters

Introduction to Scientific Computing on Linux Clusters

Presentation Transcript

Introduction to Scientific Computing

Scientific Computing on

Introduction to Scientific Computing

Introduction to Linux on Iceberg

Introduction to Scientific Computing PHYS 2500

Introduction to Scientific Computing II

Introduction to Scientific Computing II

Introduction to Scientific Computing

MA471 Introduction To Scientific Computing

Update On Scientific Linux

Introduction to Scientific Computing on BU’s Linux Cluster

Linux Clusters for High-Performance Computing

Introduction to Scientific Computing II

Introduction to Scientific and Technical computing

Introduction to Scientific Computing II

Introduction to Scientific Computing II

Update On Scientific Linux

Introduction to Scientific Computing

Update On Scientific Linux

Introduction to Scientific Computing

Introduction to Scientific Computing