1 / 36

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury, Massachusetts

andrew
Télécharger la présentation

The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Alpha 21364 and 21464 Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer VSSAD Labs, Alpha Development Group Compaq Computer Corporation Shrewsbury, Massachusetts Slides: 1998 Microprocessor Forum (Peter Bannon) and 1999 Microprocessor Forum (Joel Emer)

  2. Alpha Microprocessor Roadmap Higher Performance 0.125mm 0.18mm 0.35mm 21464 EV8 21364 EV7 21264 EV6 Lower Cost 0.125mm 0.28mm 21364 EV78 21264EV67 0.18mm 21264EV68 1998 1999 2000 2001 2002 2003 First System Ship

  3. Alpha 21264 Microprocessor • Architectural Features • First “Out-of-Order” Alpha • Four-wide superscalar • … • Performance • World’s Fastest Microprocessor (www.spec.org, 11/17/99) • 39 SPECINT95, 68 SPECFP95 @ 700 Mhz • Intel Pentium III @ 733 Mhz delivers 36 SPECINT95, 30 SPECFP95

  4. Alpha Microprocessor Roadmap Higher Performance 0.125mm 0.18mm 0.35mm 21464 EV8 21364 EV7 21264 EV6 Lower Cost 0.125mm 0.28mm 21364 EV78 21264EV67 0.18mm 21264EV68 1998 1999 2000 2001 2002 2003 First System Ship

  5. Alpha 21364 Goals • Leadership single stream performance • Higher operating frequency • Integrated memory interface • Leadership multiprocessor performance • Integrated system / multiprocessor interface

  6. Alpha 21364 Features • System-on-a-Chip • Alpha 21264 core with enhancements • Integrated L2 Cache • Integrated memory controller • Integrated network interface • Fault-Tolerance • Support for lock-step operation to enable high-availability systems.

  7. Address In Address Out R A M B U S L2 Cache Memory Controller N S E WI/O Network Interface 16 L1 Victim Buf 16 L2 Victim Buf 21364 Chip Block Diagram 21264 Core 16 L1 Miss Buffers 64K Icache 64K Dcache

  8. 21364 Core FETCH MAP QUEUE REG EXEC DCACHE Stage: 0 1 2 3 4 5 6 Int Reg Map Int Issue Queue (20) Branch Predictors Reg File (80) Exec L2 cache1.5MB 6-Set Addr Exec L1 Data Cache 64KB 2-Set Reg File (80) Exec 80 in-flight instructions plus 32 loads and 32 stores Addr Exec Next-Line Address 4 Instructions / cycle L1 Ins. Cache 64KB 2-Set FP ADD Div/Sqrt Reg File (72) FP Issue Queue (15) FP Reg Map Victim Buffer FP MUL Miss Address

  9. Integrated L2 Cache • 1.5 MB • 6-way set associative • 16 GB/s total read/write bandwidth • 16 Victim buffers for L1 -> L2 • 16 Victim buffers for L2 -> Memory • ECC SECDED code • 12ns load to use latency

  10. Integrated Memory Controller • Direct RAMbus • High data capacity per pin • 800 MHz operation • 30ns CAS latency pin to pin • 6 GB/sec read or write bandwidth • 100s of open pages • Directory based cache coherence • ECC SECDED

  11. Integrated Network Interface • Direct processor-to-processor interconnect • 10 GB/second per processor • 15ns processor-to-processor latency • Out-of-order network with adaptive routing • Asynchronous clocking between processors • 3 GB/second I/O interface per processor

  12. IO M IO M M IO M M IO IO IO M IO M IO M IO M IO M IO M IO M 364 364 364 364 364 364 364 364 364 364 364 364 21364 System Block Diagram

  13. Alpha 21364 Technology • 0.18 mm CMOS • 1000+ MHz • 100 Watts @ 1.5 volts • 3.5 cm2 • 6 Layer Metal • 100 million transistors • 8 million logic • 92 million RAM

  14. Alpha 21364 Status • 70 SPECint95 (estimated) • 120 SPECfp95 (estimated) • RTL model running • Tapeout: Summer 2000

  15. 21364 Summary: System on a Chip • Integrated L2 cache and memory controller • outstanding single processor performance • Integrated network interface • high performance multi-processor systems • scales to large number of processors

  16. Alpha Microprocessor Overview Higher Performance 0.125mm 0.18mm 0.35mm 21464 EV8 21364 EV7 21264 EV6 Lower Cost 0.125mm 0.28mm 21364 EV78 21264EV67 0.18mm 21264EV68 1998 1999 2000 2001 2002 2003 First System Ship

  17. Alpha 21464 Goals • Leadership single stream performance • Higher operating frequency / better technology • New microarchitecture • Integrated memory interface (like 21364) • Leadership multiprocessor performance • Simultaneous Multithreading (with minimal change/cost) • Integrated system / multiprocessor interface (like 21364)

  18. Alpha 21464 Technology Overview • Leading edge process technology – 1.2-2.0GHz • 0.125µm CMOS • SOI-compatible • Cu interconnect • low-k dielectrics • Chip characteristics • ~1.2V Vdd • ~250 Million transistors

  19. Alpha 21464 Architecture Overview • Enhanced out-of-order execution • 8-wide superscalar • Large on-chip L2 cache • Direct RAMBUS interface • On-chip router for system interconnect • Glueless, directory-based, ccNUMA • for up to 512-way multiprocessing • 4-way simultaneous multithreading (SMT)

  20. Instruction Issue Time Reduced function unit utilization due to dependencies

  21. Superscalar Issue Time Superscalar leads to more performance, but lower utilization

  22. Predicated Issue Time Adds to function unit utilization, but results are thrown away

  23. Chip Multiprocessor Time Limited utilization when only running one thread

  24. Fine Grained Multithreading Time Intra-thread dependencies still limit performance

  25. Simultaneous Multithreading Time Maximum utilization of function units by independent operations

  26. Decode/Map Queue Fetch Reg Read Execute Dcache/Store Buffer Reg Write Retire PC RegisterMap Regs Regs Dcache Icache Basic Out-of-order Pipeline Thread-blind

  27. Decode/Map Queue Fetch Reg Read Execute Dcache/Store Buffer Reg Write Retire PC RegisterMap Regs Regs SMT Pipeline Dcache Icache

  28. Changes for SMT • Basic pipeline – unchanged • Replicated resources • Program counters • Register maps • Shared resources • Register file (size increased) • Instruction queue • First and second level caches • Translation buffers • Branch predictor

  29. Multiprogrammed workload

  30. Decomposed SPEC95 Applications

  31. Multithreaded Applications

  32. TPU 0 TPU1 TPU2 TPU3 Icache TLB Dcache Scache Architectural Abstraction • 1 Processor with 4 Thread Processing Units (TPUs) • Shared hardware resources

  33. IO M IO IO M IO IO M M M M IO IO M M IO M IO 0 1 2 3 21464 System Block Diagram EV8 EV8 EV8 EV8 EV8 EV8 EV8 EV8 EV8

  34. Alpha 21464 Summary • Leadership single stream performance • Higher operating frequency / better technology • New microarchitecture • Integrated memory interface (like 21364) • Leadership multiprocessor performance • Simultaneous Multithreading (with minimal changes/cost) • Integrated system / multiprocessor interface (like 21364)

  35. Maintain Performance Lead Beyond Y2K • Alpha 21364 • Reuses 21264 microprocessor core • System on a chip • Alpha 21464 • New microarchitecture • System on a chip • Simultaneous Multithreading

  36. My Current Research: Beyond 21464? • The Truth Project (w/ Joel Emer) • Examines different microarchitectural issues • The Multinet Project (w/ Rick Kessler) • Tightly-coupled multiprocessor networks • The Reliant Project (w/ Steve Reinhardt) • Self-Checking Microprocessors using SMT, ISCA submission • Asim (w/ VSSAD Labs) • Performance Model for Alphas beyond 21464

More Related