html5-img
1 / 31

George Michelogiannakis, James Balfour, William J. Dally

Elastic-Buffer Flow-Control for On-Chip Networks. George Michelogiannakis, James Balfour, William J. Dally. Computer Systems Laboratory Stanford University. Edited by: Abhay Bhopat. Background. Buffer Elastic Buffer Elastic Buffer design. Introduction.

joie
Télécharger la présentation

George Michelogiannakis, James Balfour, William J. Dally

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elastic-Buffer Flow-Control for On-Chip Networks George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Edited by: Abhay Bhopat

  2. Background • Buffer • Elastic Buffer • Elastic Buffer design

  3. Introduction • Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs • Input buffers at routers are not needed • Can provide 12% more throughput per unit power • Reduces router cycle time by 18% • Compared to VC routers

  4. Outline • Building elastic-buffered channels • By using what is already there • Router microarchitecture • Deadlock avoidance • Load-sensing for adaptive routing • Evaluation

  5. The Idea • Use the network channels as distributed FIFOs • Use that storage instead of input buffers at routers • To remove input buffer area and power costs Pipelined channel Channel as FIFO

  6. Building an Elastic Buffer • To build an EB in a pipelined channel with master-slave flip-flops (FFs): • Use latches for storage by driving their enables independently Elastic buffer Master-slave FF

  7. Expanded view of EB control logic

  8. How Elastic Buffer Channels Work • Ready/valid handshake between elastic buffers • Ready: At least one free storage slot • Valid: Non-empty (driving valid data) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6

  9. Control Logic Area Overhead • Control logic is implemented as a four-state FSM with 10 gates and 2 FFs • Cost is amortized over channel width • Example: control logic increases area of a 64-bit channel by 5%

  10. Outline • Building elastic-buffered channels • Router microarchitecture • Use EB flow-control through the router • Deadlock avoidance • Load-sensing for adaptive routing • Evaluation

  11. Use EB Flow-Control Through the Router VC input-buffered router Three-slot output EB to cover for arbitration done one cycle in advance. VC & SW allocators removed. Per-output arbiters instead. Input buffer replaced by input EB LA routing also applicable to EB networks. EB router

  12. Topology 2D 4x4 FBFly

  13. Separate routers for networks

  14. Outline • Building elastic-buffered channels • Router microarchitecture • Deadlock avoidance • How to provide isolation without VCs • Load-sensing for adaptive routing • Evaluation

  15. Deadlock Avoidance: Duplicate Channels • No input buffers no virtual channels • Three types of possible deadlocks: • Protocol deadlock • Cyclic flit dependency in network • Solution: Duplicate physical channels

  16. Deadlock Avoidance: No Interleaving • Interleaving deadlock • New head flits require destination registers • Occupied destination registers depend on tail flits • Tail flits cannot bypass the new head flit • Solution: Disallow packet interleaving

  17. Duplicating Channels Between Routers • Duplicate channels with neckdown • Small improvement (still one switch port), large cost • Duplicate channels with duplicate switch ports • Excessive cost (switch quadratic cost)

  18. Dividing Into Sub-Networks More Efficient • Divide into sub-networks • Double bandwidth, double the cost • However, when narrowing datapath down to normalize for throughput or power more beneficial • Again, due to switch quadratic cost

  19. Outline • Building elastic-buffered channels • Router microarchitecture • Deadlock avoidance • Load-sensing for adaptive routing • Propose a load metric for EB networks • Evaluation

  20. Congestion metrics • Blocked Cycles • Blocked Ratio • Output Occupancy • Channel Occupancy • Channel Delay

  21. Output Channel Occupancy Load Metric • Flit-buffered networks use credit count • EB networks measure output channel occupancy • At a certain segment of the output channel (shown in red) • Occupancy decremented when flits leave that segment • Incremented by a packet’s length when routing decision is made. Packets see other decisions in same cycle

  22. Outline • Building elastic-buffered channels • Router microarchitecture • Deadlock avoidance • Load-sensing for adaptive routing • Evaluation • Compare throughput, power, area, latency, cycle time

  23. Evaluation Methodology • Used a modified version • Area/power estimations from a 65nm library • Input buffers modeled as SRAM cells • Throughput/power optimal # of VCs and buffer depth • Two sub-networks: request and reply • Averaged over a set of 6 traffic patterns • Constant packet size (512 bits) • Swept channel width from 28 to 192 bits

  24. Throughput-Power Gains in 2D Mesh Throughput gain EB network improvement: Same power: 10% increased throughput Same throughput: 12% reduced power

  25. Throughput-Area Gains in 2D Mesh 2% improvement for EB networks

  26. Latency-Throughput in 2D Mesh Zero-load latency equal

  27. Power Breakdown: No Input Buffer Power

  28. Area Breakdown: No Input Buffer Area

  29. Router RTL Implementation • No buffers, VCs, allocators, credits • VC router had look-ahead routing • Buffers: FF arrays. 2 VCs, 8 slots each 45nm, LP-CMOS, worst-case Mesh 5x5 routers. DOR. 64-bit datapath

  30. Conclusions • EB flow-control uses channels as distributed FIFOs • Removes input buffers from routers • Uses duplicate physical channels instead of VCs • Increases throughput per unit power up to 12% for low-swing • Depends on what fraction of the overall cost input buffers constitute • Reduces router cycle time by 18% • Flow-control choice depends on design parameters and priorities

  31. Thanks for your attention Questions?

More Related