1 / 71

Networks-on-Chip

Networks-on-Chip Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jp 03/01/2010

liam
Télécharger la présentation

Networks-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Networks-on-Chip Ben Abdallah Abderazek The University of Aizu, Graduate School of Computer Science and Eng. Adaptive Systems Laboratory, E-mail: benab@u-aizu.ac.jp Hong Kong University of Science and Technology, March 2010 03/01/2010

  2. Part IApplication RequirementsNetwork on Chip: A paradigm Shift in VLSICritical problems addressed by NoCTraffic abstractions Data AbstractionNetwork delay modeling Hong Kong University of Science and Technology, March 2010

  3. Application Requirements • Signal processing • Hard real time • Very regular load • High quality • Typically on DSPs • Media processing • Hard real time • Irregular load • High quality • SoC/media processors • Multimedia • Soft real time • Irregular load • Limited quality • PC/desktop Very challenging! Hong Kong University of Science and Technology, March 2010

  4. What the Internet Needs? ASIC (large, expensive to develop, not flexible) SoC, MCSoC? Increasing Huge Amount of Packets & Routing, Packet Classification, Encryption, QoS, New Applications and Protocols, etc….. • High processing power • Support wire speed • Programmable • Scalable • Specially for network applications General Purpose RISC (not capable enough) Hong Kong University of Science and Technology, March 2010

  5. Example - Network Processor (NP) • 16 pico-procesors and 1 powerPC • Each pico-processor • Support 2 hardware threads • 3 stage pipeline : fetch/decode/execute • Dyadic Processing Unit • Two pico-processors • 2KB Shared memory • Tree search engine • Focus is layers 2-4 • PowerPC 405 for control plane operations • 16K I and D caches • Target is OC-48 IBM PowerNP Adaptive Systems Laboratory, Univ. of Aizu

  6. Example - Network Processor (NP) • NP can be applied in various network layers and applications • Traditional apps – forwarding, classification • Advanced apps – transcoding, URL-based switching, security etc. • New apps Adaptive Systems Laboratory, Univ. of Aizu

  7. Telecommunication Systems and NoC Paradigm • The trend nowadays is to integrate telecommunication system on complex multicore SoC (MCSoC): • Network processors, • Multimedia hubs ,and • base-band telecom circuits • These applications have tight time-to-market and performance constraints Adaptive Systems Laboratory, Univ. of Aizu

  8. Telecommunication Systems and NoC Paradigm • Telecommunication multicore SoC is composed of 4 kinds of components: • Software tasks, • Processors executing software, • Specific hardware cores , and • Global on-chip communication network Adaptive Systems Laboratory, Univ. of Aizu

  9. Telecommunication Systems and NoC Paradigm This is the most challenging part. • Telecommunication multicore SoC is composed of 4 kinds of components: • Software tasks, • Processors executing software, • Specific hardware cores , and • Global on-chip communication network Adaptive Systems Laboratory, Univ. of Aizu

  10. Technology & Architecture Trends • Technology trends: • Vast transistor budgets • Relatively poor interconnect scaling • Need to manage complexity and power • Build flexible designs (multi-/general-purpose) • Architectural trends: • Go parallel ! • Keep core complexity constant or simplify • Result is lots of modules (cores, memories, offchip interfaces, specialized IP cores, etc.) Hong Kong University of Science and Technology, March 2010

  11. Wire Delay vs. Logic Delay 2:1 global on-chip communication to operation delay 9:1 in 2010 Ref: W.J. Dally HPCA Panel presentation 2002 Hong Kong University of Science and Technology, March 2010

  12. Communication Reliability • Information transfer is inherently unreliable at the electrical level, due to: • Timing errors • Cross-talk • Electro-magnetic interference (EMI) • Soft errors • The problem will get increasingly worse as technology scales down Adaptive Systems Laboratory, UoA

  13. Evolution of on-chip communication Hong Kong University of Science and Technology, March 2010

  14. Traditional SoC nightmare DMA CPU DSP Control signals CPU Bus A Bridge B Peripheral Bus IO IO IO C Variety of dedicated interfaces Design and verification complexity Unpredictable performance Many underutilized wires Hong Kong University of Science and Technology, March 2010

  15. Network on Chip: A paradigm Shift in VLSI From: Dedicated signal wires To: Shared network s s s Module s s s Module Module s s s Point- To-point Link Computing Module Network switch Adaptive Systems Laboratory, UoA

  16. NoC essential s s s Module s s s Module Module s s s Communication by packets of bits Routing of packets through several hops, via switches Efficient sharing of wires Parallelism Hong Kong University of Science and Technology, March 2010

  17. Characteristics of a paradigm shift • Solves a critical problem • Step-up in abstraction • Design is affected: • Design becomes more restricted • New tools • The changes enable higher complexity and capacity • Jump in design productivity Hong Kong University of Science and Technology, March 2010

  18. Characteristics of a paradigm shift We will look at the problem addressed by NoC. • Solves a critical problem • Step-up in abstraction • Design is affected: • Design becomes more restricted • New tools • The changes enable higher complexity and capacity • Jump in design productivity Hong Kong University of Science and Technology, March 2010

  19. Origins of the NoC concept The idea was talked about in the 90’s, but actual research came in the new illenium. Some well-known early publications: Guerrier and Greiner (2000) “A generic architecture for on-chip packet-switched interconnections” Hemani et al. (2000) “Network on chip: An architecture for billion transistor era” Dally and Towles (2001) “Route packets, not wires: on-chip interconnection networks” Wingard (2001) “MicroNetwork-based integration of SoCs” Rijpkema, Goossens and Wielage (2001) “A router architecture for networks on silicon” Kumar et al. (2002) “A Network on chip architecture and design methodology” De Micheli and Benini (2002) “Networks on chip: A new paradigm for systems on chip design” Hong Kong University of Science and Technology, March 2010

  20. Don't we already know how to design interconnection networks? Many existing network topologies, router designs and theory has already been developed for high end supercomputers and telecom switches Yes, and we'll cover some of this material, but the trade-offs on-chip lead to very different designs!! Hong Kong University of Science and Technology, March 2010 20

  21. Critical problems addressed by NoC 1) Global interconnect design problem: delay, power, noise, scalability, reliability 2) System integration productivity problem 3) Chip Multi Processors (key to power-efficient computing Hong Kong University of Science and Technology, March 2010

  22. 1(a): NoC and Global wire delay Long wire delay is dominated by Resistance Add repeaters Repeaters become latches (with clock frequency scaling) Latches evolve to NoC routers NoC Router NoC Router NoC Router Hong Kong University of Science and Technology, March 2010

  23. 1(b): Wire design for NoC • NoC links: • Regular • Point-to-point (no fanout tree) • Can use transmission-line layout • Well-defined current return path • Can be optimized for noise / speed / power • Low swing, current mode, …. Hong Kong University of Science and Technology, March 2010

  24. 1(c): NoC scalability • For Same Performance, compare the wire area and power Simple Bus O(n^3 √n) O(n√n) NoC: O(n) O(n) Segmented Bus: O(n^2 √n) O(n√n) Point –to-Point O(n^2 √n) O(n √n) Hong Kong University of Science and Technology, March 2010

  25. 1(d): NoC and communication reliability • Fault tolerance & error correction Router n Input buffer UMODEM U MO D E M Router U MO D E M Error correction Synchronization UMODEM ISI reduction m Parallel to Serial Convertor UMODEM U MO D E M Router U MO D E M Modulation Link Interface UMODEM Interconnect Hong Kong University of Science and Technology, March 2010 A. Morgenshtein, E. Bolotin, I. Cidon, A. Kolodny, R. Ginosar, “Micro-modem – reliability solution for NOC communications”, ICECS 2004

  26. 1(e): NoC and GALS • Modules in NoC System use different clocks • May use different voltages • NoC can take care of synchronization • NoC design may be asynchronous • No waste of power when the links and routers are idle Hong Kong University of Science and Technology, March 2010

  27. 2: NoC and engineering productivity • NoC eliminates ad-hoc global wire engineering • NoC separates computation from communication • NoC supports modularity and reuse of cores • NoC is a platform for system integration, debugging and testing Hong Kong University of Science and Technology, March 2010

  28. 3: NoC and CMP Gate Interconnect Diff. Uniprocessor dynamic power (Magen et al., SLIP 200 Uniprocessir Performance Die Area (or Power) • Uniprocessors cannot provide Power-efficient performance growth • Interconnect dominates dynamic power • Global wire delay doesn’t scale • Instruction-level parallelism is limited • Power-efficiency requires many parallel local computations • Chip Multi Processors (CMP) • Thread-Level Parallelism (TLP) Hong Kong University of Science and Technology, March 2010

  29. 3: NoC and CMP • Uniprocessors cannot provide Power-efficient performance growth • Interconnect dominates dynamic power • Global wire delay doesn’t scale • Instruction-level parallelism is limited • Power-efficiency requires many parallel local computations • Chip Multi Processors (CMP) • Thread-Level Parallelism (TLP) • Network is a natural choice for CMP! Hong Kong University of Science and Technology, March 2010

  30. 3: NoC and CMP Network is a natural choice for CMP • Uniprocessors cannot provide Power-efficient performance growth • Interconnect dominates dynamic power • Global wire delay doesn’t scale • Instruction-level parallelism is limited • Power-efficiency requires many parallel local computations • Chip Multi Processors (CMP) • Thread-Level Parallelism (TLP) • Network is a natural choice for CMP! Hong Kong University of Science and Technology, March 2010

  31. Why Now is the time for NoC? Difficulty of DSM wire design Productivity pressure CMPs Hong Kong University of Science and Technology, March 2010

  32. Traffic abstractions PE1 PE2 PE3 PE4 PE12 PE10 PE11 PE5 PE9 PE7 PE8 PE6 Traffic model are generally captured from actual traces of functional simulation A statically distribution is often assumed for message Hong Kong University of Science and Technology, March 2010

  33. Data abstractions Hong Kong University of Science and Technology, March 2010

  34. Layers of abstraction in network modeling • Software layers • Application, OS • Network & transport layers • Network topology e.g. crossbar, ring, mesh, torus, fat tree,… • SwitchingCircuit / packet switching(SAF, VCT), wormhole • AddressingLogical/physical, source/destination, flow, transaction • Routing Static/dynamic, distributed/source, deadlock avoidance • Quality of Service e.g. guaranteed-throughput, best-effort • Congestion control, end-to-end flow control • Data link layer • Flow control (handshake) • Handling of contention • Correction of transmission errors • Physical layer • Wires, drivers, receivers, repeaters, signaling, circuits,.. Hong Kong University of Science and Technology, March 2010

  35. How to select architecture ? Reconfiguration Rate During run time At boot time At design time CMP/ Multicore ASSP FPGA ASIC Flexibility Single application General purpose or Embedded systems Architecture choices depends on system needs. Hong Kong University of Science and Technology, March 2010

  36. How to select architecture ? Reconfiguration Rate During run time At boot time At design time A large range of solutions! CMP/ Multicore ASSP FPGA ASIC Flexibility Single application General purpose or Embedded systems Architecture choices depends on system needs. Hong Kong University of Science and Technology, March 2010

  37. Example: OASIS K. Mori, A. Ben Abdallah, and K. Kuruda, “Design and Evaluation of a Complexity Effective Network-on-Chip Architecture on FPGA", The 19th Intelligent System Symposium (FAN 2009), pp.318-321, Sep. 2009. S. Miura, A. Ben Abdallah, and K. Kuroda, "PNoC - Design and Preliminary Evaluation of a Parameterizable NoC for MCSoCGeneration and Design Space Exploration", The 19th Intelligent System Symposium (FAN 2009), pp.314-317, Sep. 2009. • ASIC assumed • Traffic requirement are known a-priori • Features • Packet switching – wormhole • Quality of service e • Mesh topology Hong Kong University of Science and Technology, March 2010

  38. Perspective 1: NoC vs. Bus NoC Bus • Bandwidth is limited, shared • Speed goes down as N grows • No concurrency • Pipelining is tough • Central arbitration • No layers of abstraction • (communication and computation are coupled) • However: • Fairly simple and familiar Aggregate bandwidth grows Link speed unaffected by N Concurrent spatial reuse Pipelining is built-in Distributed arbitration Separate abstraction layers However: No performance guarantee Extra delay in routers Area and power overhead? Modules need NI Unfamiliar methodology Hong Kong University of Science and Technology, March 2010

  39. Perspective 2: NoC vs. Off-chip Networks NoC Off-Chip Networks • Sensitive to cost: • area • power • Wires are relatively cheap • Latency is critical • Traffic may be known a-priori • Design time specialization • Custom NoCs are possible Cost is in the links Latency is tolerable Traffic/applications unknown Changes at runtime Adherence to networking standards Hong Kong University of Science and Technology, March 2010

  40. VLSI CAD problems Application mapping Floorplanning / placement Routing Buffer sizing Timing closure Simulation Testing Hong Kong University of Science and Technology, March 2010

  41. VLSI CAD problems in NoC Application mapping (map tasks to cores) Floorplanning / placement (within the network) Routing (of messages) Buffer sizing (size of FIFO queues in the routers) Timing closure (Link bandwidth capacity allocation) Simulation (Network simulation, traffic/delay/power modeling) Other NoC design problems (topology synthesis, switching, virtual channels, arbitration, flow control,……) Hong Kong University of Science and Technology, March 2010

  42. Typical NoC design flow Place Modules Determine routing and adjust link capacities Hong Kong University of Science and Technology, March 2010

  43. Timing closure in NoC Define inter-module traffic Place modules Increase link capacities QoS satisfied ? No • Too long capacity results in poor QoS • Too high capacity wastes area • Uniform link capacities are a waste in ASIP system Yes Finish Hong Kong University of Science and Technology, March 2010

  44. Network delay modeling • Analysis of mean packet delay us wormhole network • Multiple Virtual-Channels • Different link capacities • Different communication demands Hong Kong University of Science and Technology, March 2010

  45. NoC design requirements • High-performance interconnect • High-throughput, latency, power, area • Complex functionality (performance again) • Support for virtual-channels • QoS • Synchronization • Reliability, high-throughput, low-laten

  46. ISO/OSI network protocol stack model Hong Kong University of Science and Technology, March 2010

  47. Part IINoC topologies Switching strategiesRouting algorithmsFlow control schemesClocking schemesQoSBasic Building Blocks Status and Open Problems Hong Kong University of Science and Technology, March 2010

  48. NoC Topology The connection map between PEs • Adopted from large-scale networks and parallel computing • Topology classifications: • Direct topologies • Indirect topologies Adaptive Systems Laboratory, Univ. of Aizu

  49. Direct topologies PE PE 1 PE is connected to only a single SW SW SW PE PE SW SW Each switch (SW) connected to a single PE As the # of nodes in the system increases, the total bandwidth also increases Hong Kong University of Science and Technology, March 2010

  50. Direct topologiesMesh • 2D mesh is most popular • All links have the same length • Eases physical design • Area grows linearly with the the # of nodes Hong Kong University of Science and Technology, March 2010 4x4 Mesh

More Related