Download
parallel computer architectures 2 nd week n.
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Computer Architectures 2 nd week PowerPoint Presentation
Download Presentation
Parallel Computer Architectures 2 nd week

Parallel Computer Architectures 2 nd week

212 Vues Download Presentation
Télécharger la présentation

Parallel Computer Architectures 2 nd week

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Parallel Computer Architectures2nd week • References • Flynn’s Taxonomy • Classification of Parallel Computers Based on Architectures • Trend in Parallel Computer Architectures Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  2. REFERENCES • Scalable Parallel Computing • Kai Hwang and Zhiwei Xu, McGraw-Hill Chapter.1 • Parallel Computing: Theory and Practice • Michael J. Quinn, McGraw-Hill Chapter 3 • Parallel Processing Course • Yang-Suk Kee(yskee@iris.snu.ac.kr) School of EECS, Seoul National University Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  3. FLYNN’S TAXONOMY • A way to classify parallel computers (Flynn,1972) • Based on notions of instruction and data streams • SISD (Single Instruction stream over a Single Data stream ) • SIMD (Single Instruction stream over Multiple Data streams ) • MISD (Multiple Instruction streams over a Single Data stream) • MIMD (Multiple Instruction streams over Multiple Data stream) • Popularity • MIMD > SIMD > MISD Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  4. SISD (Single Instruction Stream Over A Single Data Stream ) • SISD • Conventional sequential machines IS : Instruction Stream DS : Data Stream CU : Control Unit PU : Processing Unit MU : Memory Unit IS IS DS CU PU MU I/O Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  5. DS DS PE1 LM1 IS Data sets loaded from host CU IS DS DS PEn LMn Program loaded from host SIMD (Single Instruction Stream Over Multiple Data Streams ) • SIMD • Vector computers • Special purpose computations PE : Processing Element LM : Local Memory SIMD architecture with distributed memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  6. IS IS CU1 CU2 CUn Memory (Program, Data) IS IS IS DS DS DS PU1 PU2 PUn I/O DS MISD (Multiple Instruction Streams Over A Single Data Streams) • MISD • Processor arrays, systolic arrays • Special purpose computations • Is it different from pipeline computer? MISD architecture (the systolic array) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  7. IS IS DS CU1 PU1 Shared Memory I/O IS DS I/O CUn PUn IS MIMD (Multiple Instruction Streams Over Multiple Data Stream) • MIMD • General purpose parallel computers MIMD architecture with shared memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  8. CLASSIFICATION BASED ON ARCHITECTURES • Pipelined Computers • Dataflow Architectures • Data Parallel Systems • Multiprocessors • Multicomputers Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  9. Cycles Instruction # 1 2 3 4 5 6 7 8 Instruction i IF ID EX WB Instruction i+1 IF ID EX WB Instruction i+2 IF ID EX WB Instruction i+3 IF ID EX WB Instruction i+4 IF ID EX WB PIPELINED COMPUTERS • Instructions are divided into a number of steps (segments, stages) • At the same time, several instructions can be loaded in the machine and be executed in different steps Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  10. DATAFLOW ARCHITECTURES • Data-driven model • A program is represented as a directed acyclic graph in which a node represents an instruction and an edge represents the data dependency relationship between the connected nodes • Firing rule: A node can be scheduled for execution if and only if its input data become valid for consumption • Dataflow languages • Id, SISAL, Silage, ... • Single assignment, applicative(functional) language, no side effect • Explicit parallelism Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  11. DATAFLOW GRAPH a + z = (a + b) * c b * z c The dataflow representation of an arithmetic expression Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  12. DATA-FLOW COMPUTER • Execution of instructions is driven by data availability • Basic components • Data are directly held inside instructions • Data availability check unit • Token matching unit • Chain reaction of asynchronous instruction executions • What is the difference between this and normal (control flow) computers? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  13. DATAFLOW COMPUTERS • Advantages • Very high potential for parallelism • High throughput • Free from side-effect • Disadvantages • Time lost waiting for unneeded arguments • High control overhead • Difficult in manipulating data structures Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  14. Token <data, tag, dest, marker> Token Queue Network I/O Switch Matching Unit Overflow Unit Instruction Store func1 funck Match <tag, dest> Dataflow Machine (Manchester Dataflow Computer) To host From host First actual hardware implementation Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  15. DATAFLOW REPRESENTATION input d,e,f c0 = 0 for i from 1 to 4 do begin ai := di / ei bi := ai * fi ci := bi + ci-1 end output a, b, c d1 e1 d2 e2 d3 e3 d4 e4 / / / / a1 a2 a3 a4 f1 * f2 * f3 * f4 * b1 b2 b3 b4 c0 + + + + c4 c1 c2 c3 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  16. EXECUTION ON CONTROL FLOW MACHINES Assume all the external inputs are available before entering do loop + : 1 cycle, * : 2 cycles, / : 3 cycles, a1 b1 c1 a2 b2 c2 a4 b4 c4 Sequential execution on a uniprocessor in 24 cycles How long will it take to execute this program on a dataflow computer with 4 processors? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  17. EXECUTION ON A DATAFLOW MACHINE a1 b1 c1 c2 c3 c4 a2 b2 a3 b3 a4 b4 Data-driven execution on a 4-processor dataflow computer in 9 cycles Can we further reduce the execution time of this program ? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  18. PROBLEMS OF DATAFLOW COMPUTERS • Excessive copying of large data structures in dataflow operations • I-structure : a tagged memory unit for overlapped usage by the producer and consumer • Retreat from pure dataflow approach (shared memory) • Handling complex data structures • Chain reaction control is difficult to implement • Complexity of matching store and memory units • Expose too much parallelism (?) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  19. DATA PARALLEL SYSTEMS • Programming model • Operations performed in parallel on each element of data structure • Logically single thread of control, performs sequential or parallel steps • Conceptually, a processor associated with each data element Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  20. DATA PARALLEL SYSTEMS (cont’d) • SIMD Architectural model • Array of many simple, cheap processors with little memory each • Processors don’t sequence through instructions • Attached to a control processor that issues instructions • Specialized and general communication, cheap global synchronization Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  21. VECTOR PROCESSOR • Instruction set includes operation on vectors as well as scalars • 2 types of vector computers • Processor arrays • Pipelined vector processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  22. Processor array Interconnection network Processing element Front-end computer Data memory Data path Program and Data Memory Processing element Data memory … CPU … Instruction path I/O processor Processing element Data memory I/0 I/0 PROCESSOR ARRAY • A sequential computer connected with a set of identical processing elements simultaneouls doing the same operation on different data. Eg CM-200 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  23. PIPELINED VECTOR PROCESSOR • Stream vector from memory to the CPU • Use pipelined arithmetic units to manipulate data • Eg: Cray-1, Cyber-205 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  24. MULTIPROCESSOR • Consists of many fully programmable processors each capable of executing its own program • Shared Address Space Architecture • Classified into 2 types • Uniform Memory Access (UMA) Multiprocessors • Non-Uniform Memory Access (NUMA) Multiprocessors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  25. SHARED ADDRESS SPACE ARCHITECTURE Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space Pn private Load Common physical addresses Store Shared portion of address space P2 private P1 private Private portion of address space P0 private Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  26. UMA MULTIPROCESSOR • Uses a central switching mechanism to reach a centralized shared memory • All processors have equal access time to global memory • Tightly coupled system • Problem: cache consistency Pi Processor i C1 C2 Cn P1 P2 Pn … Ci Cache i Switching mechanism I/O Memory banks Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  27. UMA MULTIPROCESSOR (cont’d) • Crossbar switching mechanism Mem Mem Mem Mem Cache Cache I/O I/O P P Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  28. UMA MULTIPROCESSOR (cont’d) • Shared- Bus switching mechanism Mem Mem Mem Mem Cache Cache I/O I/O P P Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  29. UMA MULTIPROCESSOR (cont’d) • Packet-switched network Mem Mem Mem Network Cache Cache Cache P P P Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  30. NUMA MULTIPROCESSOR • Distributed shared memory combined by local memory of all processors • Memory access time depends on whether it is local to the processor • Caching shared (particularly nonlocal) data? P P Mem Cache Mem Cache Network Mem Cache Mem Cache P P Distributed Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  31. CURRENT TYPES OF MULTIPROCESSORS • PVP (Parallel Vector Processor) • A small number of proprietary vector processors connected by a high-bandwidth crossbar switch • SMP (Symmetric Multiprocessor) • A small number of COTS microprocessors connected by a high-speed bus or crossbar switch • DSM (Distributed Shared Memory) • Similar to SMP • The memory is physically distributed among nodes. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  32. PVP (Parallel Vector Processor) VP : Vector Processor SM : Shared Memory VP VP VP Crossbar Switch SM SM SM Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  33. SMP (Symmetric Multi-Processor) P/C : Microprocessor and Cache P/C P/C P/C Bus or Crossbar Switch SM SM SM Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  34. MB MB P/C P/C LM LM DIR DIR NIC NIC Custom-Designed Network DSM (Distributed Shared Memory) DIR : Cache Directory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  35. M M M P P P Message-Passing Interconnection Network M P P M … … M P P M P P P … M M M MULTICOMPUTERS • Consists of many processors with their own memory • No shared memory • Processors interact via message passing  loosely coupled system Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  36. CURRENT TYPES OF MULTICOMPUTERS • MPP (Massively Parallel Processing) • Total number of processors > 1000 • Cluster • Each node in system has less than 16 processors. • Constellation • Each node in system has more than 16 processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  37. MPP (Massively Parallel Processing) MB : Memory Bus NIC : Network Interface Circuitry MB MB P/C P/C LM LM NIC NIC Custom-Designed Network Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  38. Cluster LD : Local Disk IOB : I/O Bus MB MB P/C P/C M M Bridge Bridge LD LD IOB IOB NIC NIC Commodity Network (Ethernet, ATM, Myrinet, VIA) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  39. CONSTELLATION IOC : I/O Controller >= 16 >= 16 P/C P/C P/C P/C IOC IOC Hub Hub NIC NIC LD LD SM SM SM SM Custom or Commodity Network Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  40. Trend in Parallel Computer Architectures MPPs Constellations Clusters SMPs 400 350 300 250 Number of HPCs 200 150 100 50 0 Years 1997 1998 1999 2000 2001 2002 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  41. TOWARD ARCHITECTURAL CONVERGENCE • Evolution and role of software have blurred boundary • Send/recv supported on SAS machine via buffers • Can construct global address space on MP • Hardware organization converging too. • Tighter NI integration for MP (low latency, higher bandwidth) • At lower level, hardware SAS passes hardware messages. • Nodes connected by general network and communication assists. • Cluster of workstations/SMPs • Emergence of fast SAN (System Area Networks) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

  42. Converged Architecture of Current Supercomputers Interconnection Network Memory Memory Memory Memory P P P P P P P P P P P P P P P P Multiprocessors Multiprocessors Multiprocessors Multiprocessors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM