1 / 18

Optimizing Off-Chip Memory Interconnects for Many-Core Processors

This presentation explores the challenges posed by the asymmetry between the number of cores and memory access points in many-core systems, such as Tilera’s Tile64 architecture. With 64 cores and only 4 memory controllers, off-chip memory traffic increases with core count, leading to potential latency and power consumption issues. We investigate mechanisms for reducing these metrics through the use of Tapered Fat Trees and various routing networks. The findings demonstrate that the proposed network architecture can significantly enhance performance for memory-intensive applications, maintaining efficiency with reduced power requirements.

marty
Télécharger la présentation

Optimizing Off-Chip Memory Interconnects for Many-Core Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008

  2. Motivation • In modern many-core systems, there is significant asymmetry between the number of cores and the number of memory access points • Tilera’s multiprocessor has 64 cores and only 4 memory controllers • PARSEC benchmarks suggest that off-chip memory traffic increases with the number of cores for CMPs • We explore mechanisms to lower latency and power consumption for processor-memory interconnect

  3. Tilera Tile64 x5

  4. Tilera Tile64 • Five physical mesh networks • UDN, IDN, SDN, TDN, MDN • TDN and MDN are used for handling memory traffic • Memory requests transit TDN • Large store requests, small load requests • Memory responses transit MDN • Large load responses, small store responses • Includes cache-to-cache transfers and off-chip transfers

  5. Tapered Fat-Tree • Good for many-to-few connectivity • Fewer hops  Shorter latency • Fewer routers  Less power, less area • Root nodes directly connect to memory controller • Replace MDN mesh network with two tapered fat-tree networks • One for routing requests up • One for routing responses down

  6. Tile64 with Tapered Fat Tree

  7. Memory Model • Directory-based cache coherence • Directory cache at every node • Off-chip directory controller • Tile-to-tile requests and responses transit the TDN • Off-chip memory requests and responses transit the MDN

  8. TDN and MDN Traffic for L2 Read Misses

  9. Synthetic Benchmarks • Statistical simulation • Model benchmarks from PARSEC suite • Based on off-chip traffic for 64-byte cache-line for 64 cores Working Set Size Small Large Sharing More Less

  10. Breakdown of Average Latency • Latency of memory intensive applications dominated by queuing delay. • Benchmarks with little off-chip traffic save on transit time.

  11. Power Modeling • Orion power simulator for on-chip routers from Princeton University • Models switching power as sum of • Buffer power • Crossbar power • Arbitration power • Specify parameters • Activity factor, number of input and output ports, virtual channels, size of input buffer, etc.

  12. Tilera MDN Routers

  13. Tree Routers

  14. Parameters • 100 nm CMOS process • VDD = 1.0V • Clock Frequency = 750 MHz • 32-bit flit width

  15. Conclusion • Physical design of the tapered fat-tree is more difficult • The TFT topology can reduce memory latency and power dissipation for many-core systems

More Related