Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask’har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel Institute of Technology May, 2009

Network on-Chip : the Good News  • Interconnect for SoCs, CMPs and FPGAs • Multi-hop, packet-based communication • Efficient resource sharing • Scalable performance and efficiency in • Power • Area • Design productivity System Bus

Network on-Chip : the Bad News  • Increased and hard-to-predict latency due to multi-hop and sharing • Time critical signals • Broadcast? multicast? • No easy solutions • Slow (10s of cycles) I wish I had a bus at hand ….

R R R R Module Module Module Module R R R R R R R R R R R Module Module Module Module R R R R R R R R R R R Module Module Module Module R R R R R R Module Module Module Module Solution: Bus-Enhanced NoC (BENoC) • Bus re-introduced as a NoC “add-on” • Use bus for short meta-data • Low bandwidth, low latency • Broadcast, multicast • Use NoC for data • Optimized for high bandwidth • Overhead should be justified!

R R R Module Module Module R R R Module Module Module R R R Module Module Module Module Module Module Module R R R Module Module Module Module R R R Module Module Module Module R R R Module Module Module Module Related Work • In-band support of time critical communication; and: In-band Multicast/Broadcast • Complex router implementation • Suffer from multi-hop latency • Existing Bus-NoC hybrids • Form a topological hierarchy • Typically bus used for local communication

BENoC Services • Fast unicast and multicast signaling • CMP cache example • Anycast • Find resources that fulfills certain conditions • E.g., “Looking for an idling DSP”; or“Where are the 5 closest multipliers?” • Convergecast • Efficient collection of feedback back to the initiator • Barrier synchronization, …

Additional BENoC Applications • NoC control • Router configuration • E.g., routing table configuration • Adapt NoC routing for load balancing • Fault discovery and recovery • System control • Power management • Resource load balancing • Debug

Outline • Introduction • MetaBus architecture • MetaBus latency and energy analysis • CMP cache use case

Conventional System Buses Figure is copied from “Amba Specifications Rev 2.0” - http://www.arm.com/products/solutions/AMBA_Spec.html • Bandwidth optimized • Poor scalability • Not suitable for tasks in BENoC

R R R R R R R R R R R R R R R R MetaBus Design Requirements • Low area, low power • Low bandwidth • Low latency • Simple • Versatile • Scalable • Multicast and broadcast support • Acknowledgement Module Module Module Module “MetaBus” Module Module Module Module Module Module Module Module Module Module Module Module

MetaBus Architecture • Many possible implementations • Example: tree topology with distributed arbitration Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9

Data Path Data to rootData to receivers Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9

Example: Broadcast of Two Words Address word propagates to the root Data word 1 propagates to the modules Data word 2 Root BusStation BusStation Module#1 BusStation BusStation BusStation Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9

Bus RequestBus Grant Distributed Arbitration Mechanism Root BusStation BusStation Module#1 BusStation Module#2 Module#3

Masking Saves Power Unicast from Module#3 to Module#5 Address word propagates to the root Data word 1 propagates to the modules Mask1 10101 Root Mask2 Mask3 Mask4 Mask5 BusStation 1 Mask1 1 BusStation 2 Mask2 0 Mask3 1 Mask4 0 Mask5 1 Module#1 BusStation 3 BusStation 4 BusStation 5 Module#2 Module#3 Module#4 Module#5 Module#6 Module#7 Module#8 Module#9

(Binary) Bus Station

MetaBus Floorplan – An Example • 64 modules balanced binary MetaBus

Outline • Introduction • MetaBus architecture • MetaBus Latency and energy analysis • CMP cache use case

Analysis Highlights 1/4 • NoC Broadcast+Unicast Energy/Transaction:

Analysis Highlights 2/4 • MetaBus Broadcast and Unicast Energy/Transaction:

Analysis Highlights 3/4 • NoC unicast and broadcast latency:

Analysis Highlights 4/4 • MetaBus unicast and broadcast latency:

Results - Energy Consumption • Energy consumption for a 3 data words broadcast and unicast transactions 10X10 mm chip 64 modules mesh 1GHz NoC clock Speed optimized bus @0.18um Bus and NoC unicast and broadcast energy per transaction

Results - Latencies • 3 data words broadcast and unicast transactions latencies insystem with a frequency and a speed optimized MetaBus. 10X10 mm chip 64 modules mesh 1GHz NoC clock Speed optimized bus @0.18um Figure 9: Bus and NoC broadcast latencies

Outline • Introduction • MetaBus architecture • MetaBus Latency and energy analysis • CMP cache use case

Dynamic Non-Uniform Cache Access • Split large cache into independent smaller banks • Non uniform cache access time (NUCA) • Cache lines are moved to shorten access time • Dynamic NUCA • Before fetching a into its L1$, a CPU needs to find the L2 cache storing the line CPU CPU L1$ L1$ CPU L2$ L2$ L2$ L2$ L2$ L1$ CPU L2$ L2$ L2$ L2$ L1$ CMP (Chip Multi Processor) L2$ L2$ L2$ L2$ CPU L1$ L2$ L2$ L2$ L2$ CPU L1$ L1$ L1$ CPU CPU

Simulation Setup • 16 processors, 64 L2 cache banks • PARSEC and SPLASH-2 benchmarks • Vanilla Wormhole NoC • Simulation account for bus latency, arbitration time, etc.

Simulation Results Performance improvement in BENoC compared to a NoC-based CMP (a) average read transaction latency; (b) application speed

Summary • Current NoCs are largely distributed • Borrowing concepts from off-chip networks • On-chip environment provides an opportunity • Enhancing the network with a bus gives the best of both worlds • Advanced services are easily supported • Anycast, management and control • Cost effective • Power and performance • Analysis and simulation

Bus-Enhanced NoC QNoC Research Group Thank you! Questions? zigi@tx.technion.ac.il QNoC Research Group

Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC)

Presentation Transcript

Network-on-chip

Best of Both Worlds: Information Management Solutions

Network-on-Chip

NETWORK ON CHIP ROUTER

On the Design of a Photonic Network-on-Chip

Network-on-Chip

S800base-T: the Best of Both Worlds

On Chip Bus

On Chip Bus

The Best of Both Worlds with On-Demand Virtualization

Advocacy and Compliance The Best of Both Worlds

the best of both worlds

Kaspersky Lab: The Best of Both Worlds

Paper Report An Enhanced Debug-Aware Network Interface for Network-on-Chip

NETWORK ON CHIP ROUTER

Lab6 On-Chip Bus

On-Chip Bus

BEST OF BOTH WORLDS

PWAs: The Best of Both Worlds