510 likes | 898 Vues
NETWORK ON CHIP ROUTER. Final presentation part B. Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter. Spring 2006. Agenda. NoC concept NoC’s Router design architecture & implementation (Short review of part-A) NoC design
E N D
NETWORK ON CHIP ROUTER Final presentation part B Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Spring 2006
Agenda • NoC concept • NoC’s Router design architecture & implementation (Short review of part-A) • NoC design • NoC demonstration and waveforms
NoC Concept NoC Concept
Problem: Power, size and performance not practical for multi-processor chips using a single bus interconnection Solution: Network on Chip, based interconnection: fast, reliable data and low power consumption
Project Goals • Implement a router for NoC in VHDL based upon research made by faculty members • Design and implement interface unit between NoC routers and process units • Design and implement a NoC based system connected to several processing units
Interface Interface Interface Interface Interface Interface Interface Interface Interface Processing Unit Network Scheme NoC ROUTER ROUTER ROUTER ROUTER ROUTER ROUTER ROUTER ROUTER ROUTER
Network problems • The NoC/Router should deal with these issues • Transmitting large data (buffer size, latency) • Loss of data due to full buffer • Data priority • Data from several inputs to one output • Blocked path prevents data transmission
Data in network • Packet based data flow • Single-flit or multi-flit packets (wormhole) • Service level support (packet priority) • Virtual channel support (router feature)
Packets in Wormholearchitecture • Each Packet is divided to Several smaller segments (flits) • Each flit is several bit width A whole Packet Packet body Target Address Command
Process Unit Process Unit Process Unit Process Unit Process Unit Interface Interface Between Bus & Router Local Bus To Router From Bus to Packets Processing Unit “local bus protocol” “network language” Packets in Wormhole architecture
Design Architecture Design Architecture
Input port Router 5x5 Processing Unit Processing Unit ROUTER North Interface West Module Crossbar Data East Control next router Data output port Control South
Flit structure Lsb bit CRT (first flit of packet) x 4bit (network size factor) x 12bit Or 8bit – first flit of packet one flit Data out x 2bit Type x 2bit Service Level Msb bit
4 msb bits of every flit Service Level (SL) : Block Transfer = “00” (unimplemented yet) Real-Time = “01” (unimplemented yet) RD/WR = “10” (implemented as SL2) Signaling = “11” (implemented as SL1) Data Type : Idle = “00” – no signal Body = “01” – middle/header flit of a packet EP = “10” – end of packet FP = “11” – full packet (single flit packet)
Router designarchitecture guidelines • Input port serves as register-buffer • Crossbar designed in a full-connectivity (all inputs to all output) • Output port recognizes data waiting in input port buffer to be delivered, and extracts it (read command from output port goes to one or more inputs)
Routing Control & Terminology • Each Service Level (SL) has an independent path from input port until output port buffer • According to Current Routing Table (CRT) data is channeledto the appropriate output port, via the crossbar • In each output port the Current Serviced Input Port (CSIP) is determined by a Round-Robin logic which arbitrates between input ports waiting for transmission • Next Buffer’s State (NBS) determines if next router is ready to receive data
FIFO FIFO CRT CRT Router Input Port Current Routing Table ROUTER • Input • Port Crossbar PREVIOUS ROUTER IN PATH / PROCESSING UNIT Data Per Service- Level Data In flits Switching SL SL CRT read/write Control Buffer credits Read lines From output- ports
Crossbar Example for 2 Service Levels SL1 To Output port From Input Port DATA + CRT DATA SL2 SL1 To Output Port From Input Port DATA + CRT DATA SL2 • CRT Routing DATA according to CRT CSIP
CSIP CSIP NBS NBS Round-Robin & control Router Output Port Currently Serviced Input Port Next Buffer State ROUTER • Output • Port Crossbar NEXT ROUTER IN PATH / PROCESSING UNIT Data Per Service- Level Data In flits Switching SL CSIP Read lines To input- ports Buffer credits Buffer credits
Part A – Final Router Design • Modular 5x5 Router, 16 bit data width • 2 Service Levels (SL1, SL2) • Receiving/Transmitting Multi-flit packets (not only FP*) * FP = Full-Packets
NoC’s Router top view : crossbar output port Input port
Simulation conclusions :(Part A – Router only) • Latency = 3 clock cycles for a flit • Throughput = outputs 1 flit on each clock cycle (no wasted cycles) • No latency in output transition when between SL or in switching between inputs • Router can handle receiving & transmitting of multi-flit packets of different SL, from and to different inputs & outputs simultaneously
Part B Goals • Implement Router On Virtex II Pro • Validating & Extracting design parameters (area, timing) – Done • Ramp up understanding of PPC architecture or Micro-Blaze for connecting user-logic in order to build a QNoC based application on the Virtex II Pro – Done • Design, implement & simulate interface – Done • Implement a QNoC based application on a FPGA for validation of NoC – Done • Add virtual channel to existing router – Not implemented
NoC Implementation NoC Implementation
System Implementation System Components: • 2x2 NoC, each router on the network is connected to a processing unit in order to validate NoCs activity • PPC connected to NoC IP through OPB IPIF Module • Chipscope ILA’s for signal check and analysis
Router (part-A) Router (part-A) Router (part-A) Router (part-A) NoC based Multi-core scheme Xilinx XUP Virtex™-II Pro Development System NoC Processing Module PPC & peripherals (just as another module) PC – User Interface Through UART and Chipscope Analyzer Processing Module Processing Module
OPB_UART PLB2OPB Bridge IPIF (IPIC) Embedded system diagram (SOPC) : PLB OPB PLB_BRAM Custom Peripheral DPLB Custom Peripheral PPC IPLB
Custom peripheral connection Reset IP CORE through IPIC – through code command Write and read FIFO for data transaction between IP CORE and IPIC
IP CORE - NoC Design Interface to PPC as another Module connected to the NoC 2x2 NoC Modules connected to the NoC
NoC – synthesis parameters • Tight timing parameters and over-resources • Lets try to make the NoC a bit “slim” by reducing SL Chip out of resources Tight timing
NoC 1SL – synthesis parameters • Good timing parameters, and have the resources for it Tight timing, but in PPC limits
IPIF2NoC Interface Considerations: • WR/RDFIFO communication protocol • Buffer credit calculation • Wormhole architecture
WRFIFO read protocol(Implemented in vhdl text) • WRFIFO write is done by IPIC
RDFIFO write protocol(Implemented in vhdl text) We need to WrReq only for valid ≠0 data • RDFIFO read is done by IPIC
IPIF RDFIFO protocol • WrReq only for existing valid data ( ≠0 according NoC/Router protocol)
IPIF RDFIFO Buffer state(Initialize Router to n-1 places left on RDFIFO) • Calculates if RDFIFO is not full so Output-port can transmit
Router Input Buffer State(calculate Full state to WRFIFO) • Calculates Input-port NBS, if it’s not full so WRFIFO can transmit
Future optional Enhancements • Debug router design • Adding Virtual-Channel to Router architecture • Adding wormhole architecture to the Interface architecture
NoC Implementation conclusions and summary NoC based upon developed router from part A simulated, synthesized and works well • Latency and throughput as expected form router design • multi-flit packets • Unexpected result in switching between ports and different SL, as opposed to excellent behavior in simulations. • The implementation of the part-A designed Router took more logic-area than expected.
Project conclusions • Learned a lot form the NoC project, on the NoC and Router concepts and architectures • Learned a lot on designing a system and not only looking at 1 object at a time • Improved our design and VHDL capabilities, and experienced in the “real world” hardware. • Debug tools aren’t so friendly for the first-time user, and design references should be more available.
Design Demonstration Design Demonstration
Router 11 (address 1010 Router 00 (address 0000) Router 01 (address 0100) Router 10 (address 0001) Calculator QNoC based multi-core system Xilinx XUP Virtex™-II Pro Development System UART PPC - Running pre-defined code PC – User Interface Through UART and Chipscope Analyzer OPB PPC PLB IPIC IPIF On-Board memory Inverter (Buffer) Interface to PPC through opb-ipif Calculating Chipscope Control+ILA’s XOR With ‘1’
Calculator waveform - Full * Press on shadowed boxes for zoom