A Cost Effective Centralized Adaptive Routing for Networks on Chip

A Cost Effective Centralized Adaptive Routing for Networks on Chip Ran Manevich, Israel Cidon, Avinoam Kolodny, Isask’har (Zigi) Walter and Shmuel Wimer Technion– Israel Institute of Technology QNoC Research Group

Global traffic information is essential to make the right decision!

Adaptive Routing in NoCs – Local vs. Global Information 2D Mesh NoC I CAN MAKE IT!!! Source A Packet routed from upper left to bottom right corner utilizing local congestion information. Low Congestion Medium Congestion High Congestion The same packet routed using global information.   Destination

Route Selection - ATDOR • ATDOR - Adaptive Toggle Dimension Ordered Routing • Keep it simple! Centralized selection: XY XYorYX • The option with less congested bottleneck link is preferred. • Routing tables in sources. One bit per destination.

ATDOR Illustration 1 • Five identical flows, 100 MB/s each. • Initial routing - XY • Links modeled as M/M/1 queues. Delay of a single link: • Links capacity is 210 MB/s.

Centralized Routing – How? • Option 1– Continuous calculation of optimal routing for the active sessions: • Achievable load balancing • Speed and computation complexity • System complexity

Centralized Routing – How? • Option 2 – Iterative serial selection based on traffic load measurements between XY and YX for all source-destination pairs: • Achievable load balancing • Speed and computation complexity • System complexity

ATDOR illustration 1

What did we just see? • For each flow we: • Calculated the better route. • Updated routing table of the source. • Waited for the update to take effect and measured global traffic load. • Performing steps 1-3 for each flow is slow and not scalable. • Steps 2 and 3 are unified for all destinations of a single source: • Achievable load balancing • Speed and computation complexity • Scalability

Back illustration 1

Problem #1 • Changing routing may enhance congestion and cause fluctuations. • Solution: Change routing only if the alternative is better by the margin α, 0< α <1:

ATDOR illustration 2

Problem #2 • Coupling among flows sharing the same source. • Solution:Re-Routing counters CI,J count routing changes of flows from source I to destination J (FI,J). When CI,J reaches a limit LI,J, routing of FI,J is locked. A Possible definition of Limits LI,J :

Back to illustration 2

Bring it all together

Centralized Adaptive Routing for NoCs - Architecture • Local traffic load measurements inside the routers. • Traffic load measurements aggregation into Traffic Load Maps. • Routing control.

Load Measurements Aggregation • An illustration of aggregation of load values in a 4X4 2D mesh. • A congestion value is written to each traffic load map every clock cycle.

ATDOR – Route Selection Circuit • Combinatorial pipelined implementation. • Maximally loaded links of the two alternatives are compared. Next route: • Result every ATDOR clock cycle.

Hardware Requirements • The whole mechanism was implemented on xc5vlx50t VIRTEX 5 FPGA. • Estimated area for 45nm technology node. • Per-Router hardware overheads in %for a NoC with typical size (50 KGates) virtual channel routers.

Average Packet Delay – Uniform Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Uniform traffic pattern.

Average Packet Delay – Transpose Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Transpose traffic pattern.

Average Packet Delay – Hotspot Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. 4 Hotspots traffic pattern.

Control Iteration Duration • Number of re-routed flows vs. time. • 8X8 2D Mesh, ATDOR clock of 100 MHz. • α = 15/16 • α = 3/4

CMP DNUCA - Architecture • 8X8 CMP DNUCA (Dynamic Non Uniform Cache Array) with 8 CPUs and 56 cache banks:

CMP DNUCA – Saturation Throughput • Saturation throughput - Splash 2 and Parsec benchmarks on 8X8 CMP DNUCA with 8 CPUs and 56 cache banks:

Conclusions • Centralized adaptive routing is feasible for NoCs. • ATDOR: Centralized selection between XY and YX for each source-destination pair. • Hardware overhead: <4% of an 8X8 typical NoC. • Average saturation throughput improvement:

A Cost Effective Centralized Adaptive Routing for Networks on Chip

A Cost Effective Centralized Adaptive Routing for Networks on Chip

Presentation Transcript

Networks-on-Chip

Networks-on-Chip

Indirect Adaptive Routing on Large Scale Interconnection Networks

Indirect Adaptive Routing on Large Scale Interconnection Networks

Adaptive Routing

Networks-on-Chip

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks

Throughput-Effective On-Chip Networks for Manycore Accelerators

Networks on Chip

Destination-Based Adaptive Routing for 2D Mesh Networks

Networks-on-Chip

On-Chip Communication: Networks on Chip (NoCs)

Adaptive backup routing for ad-hoc networks

Networks-on-Chip

On Adaptive Routing in Wavelength-Routed Networks

A Highly Adaptive Distributed Routing Algorithm for Mobile Wireless Networks

Networks on Chip : a quick introduction

Networks on Chip A Paradigm Change ?

Networks-on-Chip

Networks on Chip