Download
cda 5155 computer architecture principles fall 2000 n.
Skip this Video
Loading SlideShow in 5 Seconds..
CDA-5155 Computer Architecture Principles Fall 2000 PowerPoint Presentation
Download Presentation
CDA-5155 Computer Architecture Principles Fall 2000

CDA-5155 Computer Architecture Principles Fall 2000

3 Vues Download Presentation
Télécharger la présentation

CDA-5155 Computer Architecture Principles Fall 2000

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. CDA-5155 Computer Architecture PrinciplesFall 2000 Multiprocessor Architectures

  2. Review • Protocols: reliable and heterogeneous networking • Interconnect technologies/topologies • Length, latency, diameter, blocking, deadlock, bisection BW, overheads, routing, congestion, connectionless? • CPU interface to memory hierarchy vs. network (SPEC) • Standardization key for LAN, WAN • Internetworking protocols used as LAN protocols • IC revolutionizing networks and processors • Switch is a specialized computer • Amdahl: High BW networks with high overheads

  3. Overview • High performance computing • Parallelism • Taxonomy of multiprocessors • Programming models • Performance • ASCI – Accelerated Strategic Computing Initiative

  4. High Performance Computing • Hardware and software • El dorado - Attack of the killer micros • Microprocessor: the most cost-effective processor • Dynamic supercomputer market • Timesharing workloads • Multiprocessor vs. high performance uniprocessor • Performance and application domains • Throughput (multiprocessing workloads) • Timesharing, file, database, and web servers • Response time (parallel applications) • Single complex problem • Computation/communication = f(#processors, data size)

  5. Parallelism • Two or more things that happen at the same time • Granularity - size of computations performed at the same time between synchronizations • Carry lookahead adder • Pipelined processor • Two-way superscalar processor • Multiprocessor • COW • Levels of parallelism • Bit level • Instruction level • Thread level • Challenges (Amdahl’s law) • Limited amount of parallelism in programs • High cost of communication

  6. Parallel Computers • Parallel computer: collection of processing elements that cooperate and communicate to solve large problems fast. • Questions about parallel computers: • How large a collection? • How powerful are processing elements? • How do they cooperate and communicate? • How are data transmitted? • What type of interconnection? • What are HW and SW primitives for programmer? • Does it translate into performance?

  7. Taxonomy of Parallel Computers Flynn: I & D streams

  8. Shared Memory Model • Each processor can name every physical location in the machine via Load and Store • Data size: byte, word, ... or cache blocks • Process: a virtual address space (>= 1 thread of control) • Multiple processes can overlap (share), but ALL threads share a process address space • Writes to shared address space by one thread are visible to reads of other threads • Usual model: share code, private stack, some shared heap, some private heap • Performance • Latency, BW, scalability when communicate?

  9. Message Passing Model • Nodes: whole computers (CPU, RAM, I/O) • Communication: explicit I/O operations • Send (local buffer, remote process) • Recv (local buffer, remote process) • Synchronization • When send completes • When buffer free • When request accepted • Necessary even for 1 processor

  10. Shared Memory machine1 machine2 machine1 machine2 machine1 machine2 Application Application Application Application Application Application Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Language run-time system Operating system Operating system Operating system Operating system Operating system Operating system Hardware Hardware Hardware Hardware Hardware Hardware

  11. Shared-Memory SIMD

  12. Vector Addition 2 load pipes &1 store pipe 2 load/store pipes

  13. Distributed Memory SIMD

  14. Shared Memory UMA

  15. Bus-Based SMP

  16. Crossbar-Based SMP Sun Enterprise 10000

  17. NUMA

  18. Bus-Based NUMA

  19. ASCI Program • Accelerated Strategic Computing Initiative • Big impulse to the HPC industry • Architecture: clusters of RISC-based SMP nodes • Goals (1995 – 2004) • 1 Teraflops: Intel/Sandia ASCI Red • 3 Teraflops: SGI/LLNL ASCI Blue • 10 Teraflops: IBM/LLNL ASCI White • 30 Teraflops: ? • 100 Teraflops: ?

  20. Intel/Sandia ASCI Red 160 m2 200-MHz Pentium Pro Nodes: service, compute, I/O, and system Six-link router chip (dimensional, wormhole routing) Link BW: 400MB/sec (full duplex)

  21. Top 500 HPC

  22. Architectures

  23. CPU

  24. Processor Type

  25. Customer Govern’t 2% 3% 5% 17% 49% 24%

  26. Performance

  27. Manufacturers