Download
parallel architectures performance analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Architectures & Performance Analysis PowerPoint Presentation
Download Presentation
Parallel Architectures & Performance Analysis

Parallel Architectures & Performance Analysis

173 Vues Download Presentation
Télécharger la présentation

Parallel Architectures & Performance Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Parallel Architectures& Performance Analysis Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

  2. Parallel Computers • Parallel computer: multiple-processor system supporting parallel programming. • Three principle types of architecture • Vector computers, in particular processor arrays • Shared memory multiprocessors • Specially designed and manufactured systems • Distributed memory multicomputers • Message passing systems readily formed from a cluster of workstations Parallel Architectures and Performance Analysis – Slide 2

  3. Type 1: Vector Computers • Vector computer: instruction set includes operations on vectors as well as scalars • Two ways to implement vector computers • Pipelined vector processor (e.g. Cray): streams data through pipelined arithmetic units • Processor array: many identical, synchronized arithmetic processing elements Parallel Architectures and Performance Analysis – Slide 3

  4. Type 2: Shared Memory Multiprocessor Systems • Natural way to extend single processor model • Have multiple processors connected to multiple memory modules such that each processor can access any memory module • So-called shared memory configuration: Parallel Architectures and Performance Analysis – Slide 4

  5. Ex: Quad Pentium Shared Memory Multiprocessor Parallel Architectures and Performance Analysis – Slide 5

  6. Fundamental Types of Shared Memory Multiprocessor • Type 2: Distributed Multiprocessor • Distribute primary memory among processors • Increase aggregate memory bandwidth and lower average memory access time • Allow greater number of processors • Also called non-uniform memory access (NUMA) multiprocessor Parallel Architectures and Performance Analysis – Slide 6

  7. Distributed Multiprocessor Parallel Architectures and Performance Analysis – Slide 7

  8. Type 3: Message-Passing Multicomputers • Complete computers connected through an interconnection network Parallel Architectures and Performance Analysis – Slide 8

  9. Multicomputers • Distributed memory multiple-CPU computer • Same address on different processors refers to different physical memory locations • Processors interact through message passing • Commercial multicomputers • Commodity clusters Parallel Architectures and Performance Analysis – Slide 9

  10. Asymmetrical Multicomputer Parallel Architectures and Performance Analysis – Slide 10

  11. Symmetrical Multicomputer Parallel Architectures and Performance Analysis – Slide 11

  12. ParPar Cluster: A Mixed Model Parallel Architectures and Performance Analysis – Slide 12

  13. Alternate System: Flynn’s Taxonomy • Michael Flynn (1966) created a classification for computer architectures based upon a variety of characteristics, specifically instruction streams and data streams. • Also important are number of processors, number of programs which can be executed, and the memory structure. Parallel Architectures and Performance Analysis – Slide 13

  14. Flynn’s Taxonomy: SISD (cont.) Control Signals Arithmetic Processor Control unit Results Instruction Data Stream Memory Parallel Architectures and Performance Analysis – Slide 14

  15. Flynn’s Taxonomy: SIMD (cont.) Control Unit Control Signal PE 2 PE n PE 1 Data Stream 1 Data Stream 2 Data Stream n Parallel Architectures and Performance Analysis – Slide 15

  16. Flynn’s Taxonomy: MISD (cont.) Instruction Stream 1 Control Unit 1 Processing Element 1 Instruction Stream 2 Control Unit 2 Processing Element 2 Data Stream Instruction Stream n Control Unit n Processing Element n Parallel Architectures and Performance Analysis – Slide 16

  17. MISD Architectures (cont.) Serial execution of two processes with 4 stages each. Time to execute T = 8 t , where t is the time to execute one stage. Pipelined execution of the same two processes. T = 5 t Parallel Architectures and Performance Analysis – Slide 17

  18. Flynn’s Taxonomy: MIMD (cont.) Instruction Stream 1 Data Stream 1 Control Unit 1 Processing Element 1 Instruction Stream 2 Data Stream 2 Control Unit 2 Processing Element 2 Instruction Stream n Data Stream n Control Unit n Processing Element n Parallel Architectures and Performance Analysis – Slide 18

  19. Two MIMD Structures: MPMD • Multiple Program Multiple Data (MPMD) Structure • Within the MIMD classification, which we are concerned with, each processor will have its own program to execute. Parallel Architectures and Performance Analysis – Slide 19

  20. Two MIMD Structures: SPMD • Single Program Multiple Data (SPMD) Structure • Single source program is written and each processor will execute its personal copy of this program, although independently and not in synchronism. • The source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer. • Software equivalent of SIMD; can perform SIMD calculations on MIMD hardware. Parallel Architectures and Performance Analysis – Slide 20

  21. Topic 1 Summary • Architectures • Vector computers • Shared memory multiprocessors: tightly coupled • Centralized/symmetrical multiprocessor (SMP): UMA • Distributed multiprocessor: NUMA • Distributed memory/message-passing multicomputers: loosely coupled • Asymmetrical vs. symmetrical • Flynn’s Taxonomy • SISD, SIMD, MISD, MIMD (MPMD, SPMD) Parallel Architectures and Performance Analysis – Slide 21

  22. Topic 2: Performance Measures and Analysis • A sequential algorithm can be evaluated in terms of its execution time, which can be expressed as a function of the size of its input. • The execution time of a parallel algorithm depends not only on the input size of the problem but also on the architecture of a parallel computer and the number of available processing elements. Parallel Architectures and Performance Analysis – Slide 22

  23. Speedup Factor • The speedup factor is a measure that captures the relative benefit of solving a computational problem in parallel. • The speedup factor of a parallel computation utilizing p processors is defined as the following ratio: • In other words, S(p) is defined as the ratio of the sequential processing time to the parallel processing time. Parallel Architectures and Performance Analysis – Slide 23

  24. Speedup Factor (cont.) • Speedup factor can also be cast in terms of computational steps: • Maximum speedup is (usually) p with p processors (linear speedup). Parallel Architectures and Performance Analysis – Slide 24

  25. Execution Time Components • Given a problem of size n on p processors let • Inherently sequential computations (n) • Potentially parallel computations (n) • Communication operations (n,p) • Then: Parallel Architectures and Performance Analysis – Slide 25

  26. Speedup Plot “elbowing out” Number of processors  Parallel Architectures and Performance Analysis – Slide 26

  27. Efficiency • The efficiency of a parallel computation is defined as a ratio between the speedup factor and the number of processing elements in a parallel system: • Efficiency is a measure of the fraction of time for which a processing element is usefully employed in a computation. Parallel Architectures and Performance Analysis – Slide 27

  28. Analysis of Efficiency • Since E = S(p)/p, by what we did earlier • Since all terms are positive, E > 0 • Furthermore, since the denominator is larger than the numerator, E < 1 Parallel Architectures and Performance Analysis – Slide 28

  29. Maximum Speedup: Amdahl’s Law Parallel Architectures and Performance Analysis – Slide 29

  30. Amdahl’s Law (cont.) • As before since the communication time must be non-trivial. • Let f represent the inherently sequential portion of the computation; then Parallel Architectures and Performance Analysis – Slide 30

  31. Amdahl’s Law (cont.) • Limitations • Ignores communication time • Overestimates speedup achievable • Amdahl Effect • Typically (n,p) has lower complexity than (n)/p • So as p increases, (n)/p dominates (n,p) • Thus as p increases, speedup increases Parallel Architectures and Performance Analysis – Slide 31

  32. Gustafson-Barsis’ Law • As before • Let s represent the fraction of time spent in parallel computation performing inherently sequential operations; then Parallel Architectures and Performance Analysis – Slide 32

  33. Gustafson-Barsis’ Law (cont.) • Then Parallel Architectures and Performance Analysis – Slide 33

  34. Gustafson-Barsis’ Law (cont.) • Begin with parallel execution time instead of sequential time • Estimate sequential execution time to solve same problem • Problem size is an increasing function of p • Predicts scaled speedup Parallel Architectures and Performance Analysis – Slide 34

  35. Limitations • Both Amdahl’s Law and Gustafson-Barsis’ Law ignore communication time • Both overestimate speedup or scaled speedup achievable Gene Amdahl John L. Gustafson Parallel Architectures and Performance Analysis – Slide 35

  36. Topic 2 Summary • Performance terms: speedup, efficiency • Model of speedup: serial, parallel and communication components • What prevents linear speedup? • Serial and communication operations • Process start-up • Imbalanced workloads • Architectural limitations • Analyzing parallel performance • Amdahl’s Law • Gustafson-Barsis’ Law Parallel Architectures and Performance Analysis – Slide 36

  37. End Credits • Based on original material from • The University of Akron: Tim O’Neil, Kathy Liszka • Hiram College: Irena Lomonosov • The University of North Carolina at Charlotte • Barry Wilkinson, Michael Allen • Oregon State University: Michael Quinn • Revision history: last updated 7/28/2011. Parallel Architectures and Performance Analysis – Slide 37