1 / 32

TLC: Transmission Line Caches

TLC: Transmission Line Caches. Brad Beckmann David Wood Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin-Madison 12/3/03. Overview. Problem : Global interconnect Opportunity : On-chip transmission lines What are they? Why now?

rushj
Télécharger la présentation

TLC: Transmission Line Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin-Madison 12/3/03

  2. Overview • Problem: Global interconnect • Opportunity: On-chip transmission lines • What are they? • Why now? • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Consistent high performance • Simple logical design • Less substrate area • Circuit verification • Wafer manufacturing cost MICRO ’03 - TLC: Transmission Line Caches

  3. Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches

  4. Global Interconnect Problem • Global interconnect latency →Bottleneck • RC delay dominant • Held constant using repeaters • Doesn’t scale with transistors • Large structures particularly hurt • Partitioning mitigates intra-partition delay • Performance dominated by inter-partition delay MICRO ’03 - TLC: Transmission Line Caches

  5. Conventional Solution • ↑ wire size → ↓ RC delay • 3x size → 3x reduced delay • ↑ wire segment length • 3x channel area • Doesn’t scale • Intrinsic repeater delay • Inductive effects A Better Solution? MICRO ’03 - TLC: Transmission Line Caches

  6. Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches

  7. Voltage Voltage Vt Distance Driver Receiver Voltage Voltage Vt Distance Driver Receiver RC vs. TL Communication Conventional Global RC Wire On-chip Transmission Line MICRO ’03 - TLC: Transmission Line Caches

  8. RC Wire vs. TL Design Conventional Global RC Wire ~0.375 mm RC delay dominated On-chip Transmission Line ~10 mm LC delay dominated Receiver Driver MICRO ’03 - TLC: Transmission Line Caches

  9. On-chip Transmission Lines • Why now? → 2010 technology • Relative RC delay ↑ • Improve latency by 10x or more • What are their limitations? • Require thick wires and dielectric spacing • Increase wafer cost Presents a different Latency/Bandwidth Tradeoff MICRO ’03 - TLC: Transmission Line Caches

  10. Latency Comparison MICRO ’03 - TLC: Transmission Line Caches

  11. Bandwidth Comparison 2 transmission line signals 50 conventional signals • Key observation • Transmission lines – route over large structures • Conventional wires – substrate area & vias for repeaters MICRO ’03 - TLC: Transmission Line Caches

  12. Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC: Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches

  13. Texas Non-uniform Cache Architectures (NUCA) Bank Request 0x….3 Request 0x….C Cache Controller Switch SNUCA – statically partitions addresses across the banks MICRO ’03 - TLC: Transmission Line Caches

  14. Texas DNUCA Solution • Issues with DNUCA • Locating cache blocks • Power consumed accessing distant banks • 15% of total area devoted to routing channels A B Frequently requested blocks migrate towards the cache controller MICRO ’03 - TLC: Transmission Line Caches

  15. Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches

  16. TL Drivers & Receivers TL link 2x8 bytes TLC Cache Controller TLC - Transmission Line Cache 512 KB Bank High bandwidth, low latency interface between the controller and banks MICRO ’03 - TLC: Transmission Line Caches

  17. Transmission Lines Latches Multi- cycle delay Transmission Lines Transmission Line Transceivers Central Cache Controller Logic TLC Cache Controller Repeaters MICRO ’03 - TLC: Transmission Line Caches

  18. Outline • Problem: Global interconnect • Opportunity: On-chip transmission lines • Application: Large on-chip caches • Solution: TLC - Transmission Line Caches • Evaluation • Conclusions MICRO ’03 - TLC: Transmission Line Caches

  19. Methodology • Assumptions • ITRS projection for 2010 • 45 nm technology • Low-k (2.1) intermetal dielectric • 10 GHz operational frequency • Physical Evaluation • Linpar RLC extractor • Hspice W element transmission line • Performance Evaluation • Full system simulation • Simics extended with an Out-of-Order processor and memory system timing models MICRO ’03 - TLC: Transmission Line Caches

  20. Cache Characteristics • Exclusive write-back caches • 4 wide, 30 stage pipeline, OoO processor • 300 cycle memory latency MICRO ’03 - TLC: Transmission Line Caches

  21. Performance SpecINT SpecFP Commercial MICRO ’03 - TLC: Transmission Line Caches

  22. Substrate Area * 18% reduction • On-chip transmission lines allow direct routing from the driver to receiver without repeaters • Facilitates compact layout • Devotes less substrate area to the routing channels MICRO ’03 - TLC: Transmission Line Caches

  23. Link Utilization MICRO ’03 - TLC: Transmission Line Caches

  24. Optimized TLC Designs • Utilize fewer transmission lines • Base design: requires 2k transmission lines • Opt designs: require 1k, 500, & 350 • Reduce manufacturing cost • Increase logic complexity MICRO ’03 - TLC: Transmission Line Caches

  25. Link Utilization (TLC Family) MICRO ’03 - TLC: Transmission Line Caches

  26. Performance (TLC Family) MICRO ’03 - TLC: Transmission Line Caches

  27. Conclusions 1 • Transmission lines offer a different latency/bandwidth tradeoff • Advantages • Lower latency for global links • Direct routing over large structures • Limitations • Large, sparsely populated, metal layers • Greater circuit verification effort MICRO ’03 - TLC: Transmission Line Caches

  28. Conclusions 2 • Possible application: TLC • Advantages • Consistent high performance • Simpler logical design • 18% less substrate area • Less power in the communication network • Disadvantages • Circuit verification • Wafer cost MICRO ’03 - TLC: Transmission Line Caches

  29. Other Applications? MICRO ’03 - TLC: Transmission Line Caches

  30. Optimized TLC Designs • TLCopt 1000 • Blocks are partitioned across 2 banks • Each transmission line link is 126 bits wide • 1008 total data TLs TL link 2x64 bits TL link 2x44 bits TL link 2x126 bits • TLCopt 500 • Blocks are partitioned across 4 banks • Each transmission line link is 64 bits wide • 512 total data TLs 1 MB Bank • TLCopt 350 • Blocks are partitioned across 8 banks • Each transmission line link is 44 bits wide • 352 total data TLs TLCopt Cache Controller MICRO ’03 - TLC: Transmission Line Caches

  31. Equake Performance MICRO ’03 - TLC: Transmission Line Caches

  32. Additional Transceiver Delay MICRO ’03 - TLC: Transmission Line Caches

More Related