1 / 43

Power Efficient Cache Coherence

Power Efficient Cache Coherence. Craig Saldanha Mikko Lipasti. Motivation. Power consumption becoming a serious design constraint. Market demand for faster and more complex servers. Complex coherence protocol and interconnect. f clock + Complexity = P interconnect. Motivation.

nathan
Télécharger la présentation

Power Efficient Cache Coherence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power Efficient Cache Coherence Craig Saldanha Mikko Lipasti

  2. Motivation • Power consumption becoming a serious design constraint. • Market demand for faster and more complex servers. • Complex coherence protocol and interconnect. • fclock + Complexity = Pinterconnect

  3. Motivation • Traditional power saving methodologies ineffective. • Minimize number of transaction packets. • At the end-points: Jetty. • At the source: Serial Snoop.

  4. OUTLINE • Overview of Snoop based protocols and opportunity for power savings • Latency and Power consumption of parallel snooping techniques • Serial Snooping • Results • Conclusions • Future Work

  5. Local Node Tag Lookup Bus Arbitration Snoop Transmission Remote Node Tag Lookup Snoop Response Combination of Responses Data Fetch Data Transmit Snoop Based Coherence P1 P2 P3 P4 Tag Array Memory Data Array

  6. 3 Degrees of Freedom to Speculate SnoopingData FetchData Transmit Degrees of Speculation Parallel Snoop,Spec Dfetch Spec DXmit D-Xmit Parallel Snoop,Spec Dfetch Non-Spec DXmit DFetch X D-Xmit Parallel Snoop,Non-Spec Dfetch Non-Spec DXmit Snoop Serial Snoop,Spec Dfetch Spec DXmit D-Xmit Serial Snoop,Spec Dfetch Non-Spec DXmit DFetch X D-Xmit Serial Snoop,Non-Spec Dfetch Non-Spec DXmit

  7. Latency and Power assumptions • Consider only load misses • Tree of point-point connections. • Latency to traverse a link: 1Bus cycle (7ns) • Tag Look up :1 bus cycle (7ns) • D-Fetch: 2 bus cycles (14ns) • DRAM access: 10 bus cycles (70ns) Backplane MEMORY Root Node Switch2 Switch1 P1 P2 P3 P4 Board1

  8. 0 7 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY Power Plink

  9. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY 0 7 14 Power Plink+Pswitch

  10. 0 7 14 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY 21 Power 2Plink+Pswitch

  11. 0 7 14 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY 21 28 Power 2Plink+2Pswitch

  12. 0 7 14 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast MEMORY Latency 21 28 35 Power 5Plink+2Pswitch

  13. 0 7 14 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY 21 28 35 42 Power 5Plink+4Pswitch

  14. 0 7 14 Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Snoop Broadcast Latency MEMORY 21 28 35 42 49 Power 8Plink+4Pswitch

  15. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Memory Access :Data Fetch MEMORY Latency 35 91 105 Power Pmem

  16. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Memory Access :Data Transmit MEMORY Latency 35 91 105 140 Power Pmem+3Plink+2Pswitch

  17. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node: Tag Lookup MEMORY Latency 49 56 Power 3Ptag

  18. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node :Snoop Response MEMORY Latency 49 56 105 Power 3Ptag+3*(Pswitch+2Plink)+Plink+2Pswitch+2Plink

  19. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node: Data Fetch MEMORY Latency 49 63 Power 3Pcache

  20. Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Remote Node :Data Transmit MEMORY Latency 49 63 112 Power 3Ptag+3*(4Plink+3Pswitch)

  21. 49 0 TL TL Snoop BRDCST Local Node 49 49 56 56 105 105 TL RSP + CMB RSP + CMB Remote Node 49 49 63 63 112 DF DF Remote Node Data Xmit 35 35 91 91 105 105 140 140 Memory Access Memory Access Data Xmit Data Xmit Memory Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Latency Power Remote Node supplies the data 29Plink+18Pswitch+3Ptag+3Pcache+Pmem Memory supplies the data 20Plink+11Pswitch+3Ptag+3Pcache+Pmem

  22. Parallel Snoop Speculative Fetch Non-Speculative Transmit (PS/SF/NT) Latency 0 49 Local Node TL Snoop BRDCST RSP 49 56 105 TL Remote Node + CMB Remote Node 49 63 105 154 DF Data Xmit 35 91 105 140 Memory Access Data Xmit Memory Power Remote Node supplies the data 21Plink+12Pswitch+3Ptag+Pcache+Pmem Memory supplies the data 20Plink+11Pswitch+3Ptag+3Pcache+Pmem

  23. 0 49 Local Node TL Snoop BRDCST 49 RSP+ 105 TL Remote Node CMB 105 119 168 DF Data Xmit Remote Node 91 161 196 Memory Access Data Xmit Memory Parallel Snoop Non-Speculative Fetch Non-Speculative Transmit (PS/NF/NT) Latency Power Remote Node supplies the data 21Plink+12Pswitch+3Ptag+Pcache Memory supplies the data 20Plink+11Pswitch+3Ptag+Pmem

  24. Avoids Speculative transmission of Snoop packets. Serial Snooping MEMORY

  25. Avoids Speculative transmission of Snoop packets. Check the nearest neighbor Data supplied with minimum latency and power Serial Snooping MEMORY

  26. Forward snoop to next level Serial Snooping MEMORY

  27. Forward snoop to next level Serial Snooping MEMORY

  28. Search other half of tree Serial Snooping MEMORY

  29. Search other half of tree Search leaf nodes serially Serial Snooping MEMORY

  30. Search other half of tree Search leaf nodes serially Serial Snooping MEMORY

  31. Latency to satisfy a request dependent on distance from requestor. Data resident at the nearest neighbor supplied with the lowest latency and power. Requests visible to memory controller only at root node. Latency is adversely affected when requested data present at the farthest node Worst case power consumption is still less than the parallel snooping. Serial Snooping : Features

  32. Serial Snooping : Request satisfied by Nearest Node Latency MEMORY 0 21 SNP TL P1 28 21 49 TL RSP P2 21 35 56 DF Data Xmit P2 Power Xmit Snoop: 2Plink + Pswitch P2 Tag access and snoop response: Ptag + 2Plink + Pswitch P2 Data Fetch and Xmit: Pcache +2Plink + Pswitch Ptotal= 6Plink+3Pswitch+Ptag+Pcache

  33. 0 21 TL SNP P1 21 28 49 63 77 P2 TL RSP 84 77 133 TL Xmit P3 77 91 140 DF Xmit P3 63 133 168 Memory Access Xmit Memory Serial Snooping : Request satisfied by Next-Nearest Neighbor Latency MEMORY Power 16Plink+10Pswitch+2Ptag+Pcache

  34. Serial Snooping : Request satisfied by farthest node Latency MEMORY 0 21 TL Xmit P1 21 28 49 63 77 TL P2 RSP 84 77 10 Xm TL P3 P4 105 112 161 TL RSP P4 105 119 147 168 DF Xmit 168 Memory Access 133 Xmit Spec-Memory 133 147 182 Memory Access Xmit 147 217 252 Memory Access Xmit Non-Spec-Memory Power Remote Node supplies the data 18Plink+11Pswitch+3Ptag+Pcache If Memory supplies the data 17Plink+10Pswitch+3Ptag+Pmem

  35. RESULTS : Load Miss Distributions

  36. RESULTS: Average Latencies to satisfy load misses

  37. RESULTS: Relative Power Savings

  38. CONCLUSIONS • Reducing degree of speculation has potential for significant power savings • Performance degradation is minimal for the set of benchmarks studied. • Serial Snooping with speculative memory fetch provides optimal latency and power consumption.

  39. Future Work • Develop detailed execution-driven Power Model • Explore different interconnect topologies. • Examine the viability of adaptive mechanisms for protocol policy.

  40. 0 7 14 Serial Snooping Nearest Neighbor MEMORY Latency 21 Power 2Plink+Pswitch

  41. Questions

  42. 49 0 TL TL Snoop BRDCST Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST) Local Node MEMORY 35 35 91 Memory Access Memory Access

More Related