1 / 19

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Pa

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA. Mario D. Marino, Kevin Skadron Dept. of Computer Science – UVA {mdm9u,skadron}@cs.virginia.edu. What is the problem?.

peri
Télécharger la présentation

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Pa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same PackageWEED 2011, ISCA Mario D. Marino, Kevin Skadron Dept. of Computer Science – UVA {mdm9u,skadron}@cs.virginia.edu

  2. What is the problem? Excessive power usage by the physical memory channel 2mW/Gbits/s by Palmer et al. ISSCC’07 160W for 10TB/s (Vantrease et al., ISCA’08) Poor scaling in physical channel: RC load in package

  3. Outline Hypothesis: Wired-RF (ie, coplanar waveguides--CPWs) solves all these problems in technology that is easier to adopt than optical. Architecture for CPW memory interface Evaluation: area, power, and performance Conclusion PS: note that this is over wires (CPWs), not wireless!

  4. Hypothesis: why wired-RF (RF) as a bandwidth solution? Low latency media and modulation (Chang et al., “Near Speed-of-Light Signaling Over On-Chip Electrical”, 2003)‏ Intel-Tera (Polka, ITJ’07)‏: on-package Quilt-packaging (RF coplanar waveguide connecting two dies, > 200GHz, low insertion loss, built), Liu, Buckhanan et al., Notre Dame RF Frank Chang et al. (caches, modulation, high bandwidth, latency ad power reduction; MICRO’08, HPCA’08)‏ All electrical (impedances matching), development costs closer to CMOS Beckmann et al., “Transmission Line Caches”, MICRO’03 distances from 1mm to 30cm (delays, energy, data rate; “RF for Future Chips”, Tam et al. 2011)‏ Modulation and high speed from optical

  5. Why can't we use RF in a traditional fashion? Different impedances: I/O pad, inner and outer wire bonds, PCB pads, PCB [Liu, 2006]

  6. Contributions Evaluate power and area gains by replacing power-hungry MC circuitry with on-die RF transceivers + CPW + Quilt packaging Evaluate architectural performance gains due to power and area gains

  7. Diagram of the proposed organization Example with 1 core and 1 RFMC RF path from a specific core to its rank > 1mm

  8. Detailed Organization RFMC: MCs coupled to on-die RF transceivers and on- and inter-die coplanar waveguides (CPW)

  9. Quilt The use of Quilt (inter-dies distance ~40um) allows: Extending on-die CPWs Built for RF/low insertion loss: 0.1 dB Use of processor-die and DRAM dies, RF transceivers, and UCLA RF models Versus traditional power hungry transceivers (Palmer et al., ISSCC 2007) Co-planar, not flip-chip See Liu’s PhD dissertation and Buckhanan et al., UGIM’10

  10. Interfacing on-dies CPW and Quilt

  11. Quilt Packaging is a CPW‏ Extension of the interconnection of two dies facing each other Designed for frequencies larger than 200GHz Prototype from Notre Dame tested up to 60 GHz Insertion loss (*): 0.1 dB So far, no transceivers needed for Quilt; due to its low insertion loss

  12. Transceivers: Power and Area Extracted from Chang, Tam with 10% power reduction on the amplifier to account for savings for Quilt-type packaging

  13. Area Comparison MC Area decreases for all components, but RF essentially eliminates PHY 2.4X area savings MC RFMC

  14. Energy Comparison-PHY Even with technology improvements, RF is more efficient for distances >= 1mm and < 10mm Net power savings (incl. FE & TE) of 4.6X at 5mm

  15. Performance Evaluation • M5 and DRAMsim • 32K L1s, 1MB/core L2 • 8 cores • 1 DRAM rank per MC, DDR2, at 2 GHz • Same FE, TE for both MC, RFMC • No RF latency benefits in the performance evaluation

  16. Performance: Stream Baseline—current CPUs: 3 or 4 MC RFMC is up to 2.4x faster than MC

  17. Conclusions RF architecture for on-package CPU-DRAM interconnection Evolutionary changes to CPU and DRAM design—straightforward manufacturability Area and power benefits (preliminary; improve with Quilt dedicated circuits) Benefits on performance for more cores (limited to the number of ranks if the same proportion core-to-rank is desired)

  18. Thanks!

  19. Power Comparison FE and TE present power reduction PHY/RF part is evaluated in the next slide (McPAT does not model RF)

More Related