1 / 37

Parallel Architectures & Performance Analysis

Parallel Architectures & Performance Analysis. Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron. Parallel Computers. Parallel computer: multiple-processor system supporting parallel programming. Three principle types of architecture

alden
Télécharger la présentation

Parallel Architectures & Performance Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Architectures& Performance Analysis Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

  2. Parallel Computers • Parallel computer: multiple-processor system supporting parallel programming. • Three principle types of architecture • Vector computers, in particular processor arrays • Shared memory multiprocessors • Specially designed and manufactured systems • Distributed memory multicomputers • Message passing systems readily formed from a cluster of workstations Parallel Architectures and Performance Analysis – Slide 2

  3. Type 1: Vector Computers • Vector computer: instruction set includes operations on vectors as well as scalars • Two ways to implement vector computers • Pipelined vector processor (e.g. Cray): streams data through pipelined arithmetic units • Processor array: many identical, synchronized arithmetic processing elements Parallel Architectures and Performance Analysis – Slide 3

  4. Type 2: Shared Memory Multiprocessor Systems • Natural way to extend single processor model • Have multiple processors connected to multiple memory modules such that each processor can access any memory module • So-called shared memory configuration: Parallel Architectures and Performance Analysis – Slide 4

  5. Ex: Quad Pentium Shared Memory Multiprocessor Parallel Architectures and Performance Analysis – Slide 5

  6. Fundamental Types of Shared Memory Multiprocessor • Type 2: Distributed Multiprocessor • Distribute primary memory among processors • Increase aggregate memory bandwidth and lower average memory access time • Allow greater number of processors • Also called non-uniform memory access (NUMA) multiprocessor Parallel Architectures and Performance Analysis – Slide 6

  7. Distributed Multiprocessor Parallel Architectures and Performance Analysis – Slide 7

  8. Type 3: Message-Passing Multicomputers • Complete computers connected through an interconnection network Parallel Architectures and Performance Analysis – Slide 8

  9. Multicomputers • Distributed memory multiple-CPU computer • Same address on different processors refers to different physical memory locations • Processors interact through message passing • Commercial multicomputers • Commodity clusters Parallel Architectures and Performance Analysis – Slide 9

  10. Asymmetrical Multicomputer Parallel Architectures and Performance Analysis – Slide 10

  11. Symmetrical Multicomputer Parallel Architectures and Performance Analysis – Slide 11

  12. ParPar Cluster: A Mixed Model Parallel Architectures and Performance Analysis – Slide 12

  13. Alternate System: Flynn’s Taxonomy • Michael Flynn (1966) created a classification for computer architectures based upon a variety of characteristics, specifically instruction streams and data streams. • Also important are number of processors, number of programs which can be executed, and the memory structure. Parallel Architectures and Performance Analysis – Slide 13

  14. Flynn’s Taxonomy: SISD (cont.) Control Signals Arithmetic Processor Control unit Results Instruction Data Stream Memory Parallel Architectures and Performance Analysis – Slide 14

  15. Flynn’s Taxonomy: SIMD (cont.) Control Unit Control Signal PE 2 PE n PE 1 Data Stream 1 Data Stream 2 Data Stream n Parallel Architectures and Performance Analysis – Slide 15

  16. Flynn’s Taxonomy: MISD (cont.) Instruction Stream 1 Control Unit 1 Processing Element 1 Instruction Stream 2 Control Unit 2 Processing Element 2 Data Stream Instruction Stream n Control Unit n Processing Element n Parallel Architectures and Performance Analysis – Slide 16

  17. MISD Architectures (cont.) Serial execution of two processes with 4 stages each. Time to execute T = 8 t , where t is the time to execute one stage. Pipelined execution of the same two processes. T = 5 t Parallel Architectures and Performance Analysis – Slide 17

  18. Flynn’s Taxonomy: MIMD (cont.) Instruction Stream 1 Data Stream 1 Control Unit 1 Processing Element 1 Instruction Stream 2 Data Stream 2 Control Unit 2 Processing Element 2 Instruction Stream n Data Stream n Control Unit n Processing Element n Parallel Architectures and Performance Analysis – Slide 18

  19. Two MIMD Structures: MPMD • Multiple Program Multiple Data (MPMD) Structure • Within the MIMD classification, which we are concerned with, each processor will have its own program to execute. Parallel Architectures and Performance Analysis – Slide 19

  20. Two MIMD Structures: SPMD • Single Program Multiple Data (SPMD) Structure • Single source program is written and each processor will execute its personal copy of this program, although independently and not in synchronism. • The source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer. • Software equivalent of SIMD; can perform SIMD calculations on MIMD hardware. Parallel Architectures and Performance Analysis – Slide 20

  21. Topic 1 Summary • Architectures • Vector computers • Shared memory multiprocessors: tightly coupled • Centralized/symmetrical multiprocessor (SMP): UMA • Distributed multiprocessor: NUMA • Distributed memory/message-passing multicomputers: loosely coupled • Asymmetrical vs. symmetrical • Flynn’s Taxonomy • SISD, SIMD, MISD, MIMD (MPMD, SPMD) Parallel Architectures and Performance Analysis – Slide 21

  22. Topic 2: Performance Measures and Analysis • A sequential algorithm can be evaluated in terms of its execution time, which can be expressed as a function of the size of its input. • The execution time of a parallel algorithm depends not only on the input size of the problem but also on the architecture of a parallel computer and the number of available processing elements. Parallel Architectures and Performance Analysis – Slide 22

  23. Speedup Factor • The speedup factor is a measure that captures the relative benefit of solving a computational problem in parallel. • The speedup factor of a parallel computation utilizing p processors is defined as the following ratio: • In other words, S(p) is defined as the ratio of the sequential processing time to the parallel processing time. Parallel Architectures and Performance Analysis – Slide 23

  24. Speedup Factor (cont.) • Speedup factor can also be cast in terms of computational steps: • Maximum speedup is (usually) p with p processors (linear speedup). Parallel Architectures and Performance Analysis – Slide 24

  25. Execution Time Components • Given a problem of size n on p processors let • Inherently sequential computations (n) • Potentially parallel computations (n) • Communication operations (n,p) • Then: Parallel Architectures and Performance Analysis – Slide 25

  26. Speedup Plot “elbowing out” Number of processors  Parallel Architectures and Performance Analysis – Slide 26

  27. Efficiency • The efficiency of a parallel computation is defined as a ratio between the speedup factor and the number of processing elements in a parallel system: • Efficiency is a measure of the fraction of time for which a processing element is usefully employed in a computation. Parallel Architectures and Performance Analysis – Slide 27

  28. Analysis of Efficiency • Since E = S(p)/p, by what we did earlier • Since all terms are positive, E > 0 • Furthermore, since the denominator is larger than the numerator, E < 1 Parallel Architectures and Performance Analysis – Slide 28

  29. Maximum Speedup: Amdahl’s Law Parallel Architectures and Performance Analysis – Slide 29

  30. Amdahl’s Law (cont.) • As before since the communication time must be non-trivial. • Let f represent the inherently sequential portion of the computation; then Parallel Architectures and Performance Analysis – Slide 30

  31. Amdahl’s Law (cont.) • Limitations • Ignores communication time • Overestimates speedup achievable • Amdahl Effect • Typically (n,p) has lower complexity than (n)/p • So as p increases, (n)/p dominates (n,p) • Thus as p increases, speedup increases Parallel Architectures and Performance Analysis – Slide 31

  32. Gustafson-Barsis’ Law • As before • Let s represent the fraction of time spent in parallel computation performing inherently sequential operations; then Parallel Architectures and Performance Analysis – Slide 32

  33. Gustafson-Barsis’ Law (cont.) • Then Parallel Architectures and Performance Analysis – Slide 33

  34. Gustafson-Barsis’ Law (cont.) • Begin with parallel execution time instead of sequential time • Estimate sequential execution time to solve same problem • Problem size is an increasing function of p • Predicts scaled speedup Parallel Architectures and Performance Analysis – Slide 34

  35. Limitations • Both Amdahl’s Law and Gustafson-Barsis’ Law ignore communication time • Both overestimate speedup or scaled speedup achievable Gene Amdahl John L. Gustafson Parallel Architectures and Performance Analysis – Slide 35

  36. Topic 2 Summary • Performance terms: speedup, efficiency • Model of speedup: serial, parallel and communication components • What prevents linear speedup? • Serial and communication operations • Process start-up • Imbalanced workloads • Architectural limitations • Analyzing parallel performance • Amdahl’s Law • Gustafson-Barsis’ Law Parallel Architectures and Performance Analysis – Slide 36

  37. End Credits • Based on original material from • The University of Akron: Tim O’Neil, Kathy Liszka • Hiram College: Irena Lomonosov • The University of North Carolina at Charlotte • Barry Wilkinson, Michael Allen • Oregon State University: Michael Quinn • Revision history: last updated 7/28/2011. Parallel Architectures and Performance Analysis – Slide 37

More Related