240 likes | 441 Vues
The Optimization of Interconnection Networks in FPGAs. Dr. Yajun Ha Assistant Professor Department of Electrical & Computer Engineering National University of Singapore. Outline. Background and Motivation Time-multiplexed interconnects in FPGAs sFPGA2 architecture Conclusion.
E N D
The Optimization of Interconnection Networks in FPGAs Dr. Yajun Ha Assistant Professor Department of Electrical & Computer Engineering National University of Singapore Dagstuhl Seminar
Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar
FPGA Research Challenges • Research challenges for FPGA architectures and tool are closely linked. • The source for FPGA challenges are coming from the underlying semiconductor technologies. • Scaling semiconductor technologies bring the following new challenges: • Leakage Power • Process variations • Substantially more transistors Technology Architecture Dual Vt or Vdd or subthreshold architectures Tools Reconfigurability for variability, fault tolerant Scalable, multi-core, secure architectures and SLD Dagstuhl Seminar
Motivation • Logic and Interconnect are un-balanced in FPGAs. • Qualitatively: • “PLDs are 90% routing and 10% logic.” • Prof. Jonathan Rose, Design of Interconnection Networks for Programmable Logic, Kluwer Academic Publishers, 2004, Page xix; • “…(in FPGAs) programmable interconnect comes at a substantial cost in performance in area, performance and power.” • Prof. Jan Rabaey, Digital Integrated Circuits, 2nd Edition, Prentice-Hall, 2003, Page 413; • Quantitatively: • Area: Logic area v.s. Routing area; • Delay: Logic delay v.s. Net delay; • Power: Dynamic power consumption by Logic v.s. by Interconnect. Dagstuhl Seminar
Unbalance: Area Relative weight of routing area and logic area of the 20 largest MCNC benchmark circuits, assuming PTM 90nm CMOS process. Data produced by VPR v5.0.2. Dagstuhl Seminar
Unbalance: Delay Delay breakdown along the critical path for the 20 largest MCNC benchmarks, assuming PTM 90nm CMOS process. Data produced by VPR v5.0.2. Dagstuhl Seminar
Unbalance: Power Note: Double: The length-2 wires; Hex: The length-6 wires; Long: The long wires spanning the whole chip; IXbar & OXbar: Crossbar at the input & output pins of the logic blocks. Dynamic power breakdown for a real circuit [1], assuming the Xilinx Virtex-II FPGAs [1] L. Shang, A. Kaviani and K. Bathala, “Dynamic power consumption in Virtex-II FPGA family,” ACM FPGA, 2002. Dagstuhl Seminar
Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar
Intra-Clock Cycle Idleness • Clock cycle is constrained by the critical path delay. Many wires are idle for a significant amount of time in a clock cycle. • An example: • clma: the largest circuit (~8400 4-input LUTs) in MCNC benchmark; • Use VPR v5.0.2 to implement to an island-style FPGA (10 4-inputs LUT in each CLB and 100% length-4 wires ), assuming the PTM 90nm CMOS process; • Timing results after P&R: • Critical path delay = 9.50ns; • Delay of most nets (~96.5%) are less than 1ns; • Expensive wires are often less utilized. Dagstuhl Seminar
Time-Multiplexing Net N1 CLB CLB Net N2 CLB CLB routing wire Switches with multiple contexts Conventional switch Two nets use two wires Two nets share one wire • Use switches with multiple contexts to achieve time-multiplexing of wires. Keep wires busy; • Can potentially save wire area and achieve better timing performance. Dagstuhl Seminar
Preliminary Results • Bring time-multiplexing enhancements to existing CAD tools; • Preliminary studies show positive results: • For 16 MCNC benchmark circuits, ~11.5% saving in minimum required number of wires, (but) ~1.5% timing overhead; • For 16 MCNC benchmark circuits, ~8.2% reduction in critical path delay, using the same number of wires; • See [1] [2] for details. [1] H. Liu et al, “An Area-Efficient Timing-Driven Routing Algorithm for Scalable FPGAs with Time-Multiplexed Interconnects,” FCCM 2008. [2] H. Liu et al, “An Architecture and Timing-Driven Routing Algorithm for Area-Efficient FPGAs with Time-Multiplexed Interconnects,” FPL 2008. Dagstuhl Seminar
TM FPGA Challenges and Ongoing Work • The TM rate cannot be too high to have a reasonable TM clock rate. We are targeting at 2-4 at the moment. • The nets that are qualified for TM are limited since most nets having delays finished in the first micro-cycle. • Dual Vt architectures are proposed to adjust the delay to achieve low power and higher TM opportunities. Dagstuhl Seminar
Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar
Motivation • In current FPGAs, switching requirement grows superlinearly with number of logic resources. In other words, current architecture scales poorly. • To address this, we need to organize FPGA interconnecting wires hierarchically to achieve scalability [3] Rizwan Syed et al, “sFPGA2 - A Scalable GALS FPGA Architecture and Design Methodology,” FPL 2009. Dagstuhl Seminar
How Multiple FPGAs Are Connected? MGT based Serial Switch Interconnect PCI Express Serial and switched based interconnects are the future of peripheral interconnect! Dagstuhl Seminar
sFPGA2 Is an On-Chip Version • sFPGA2 is a scalable FPGA architecture using hierarchical routing network employing high speed serial links and switches to route multiple nets simultaneously [3]. • Consists of two levels: • Base Level (eg.: A0…A7, S0) • Higher Levels (eg.: X0) Architecture Block Diagram [3] Rizwan Syed et al, “sFPGA2 - A Scalable GALS FPGA Architecture and Design Methodology,” FPL 2009. Dagstuhl Seminar
sFPGA2 Architecture (Contd) Courtesy of Xilinx (Virtex II Pro) • A0…A7 are FPGA tiles (similar to current FPGAs). S0 contains very high speed transceivers capable of aggregating multiple high speed serial links into a very high link. Dagstuhl Seminar
sFPGA2 (Contd) • Routing is done using either of the two methodology shown in figure. • Intra cluster routing uses only the switch blocks and channels in that level. • While inter cluster routing uses very high speed links and switches. Dagstuhl Seminar
Design Methodology v0 NOP v1 * * * * + v2 v6 v8 v10 + < v7 * * v3 v11 v9 An inter tile net - v4 - v5 NOP vn The new step to deal with inter-tile nets! Dagstuhl Seminar
Preliminary Results • Successfully implemented a JPEG engine and demonstrated it to transport groups of nets on an emulation platform built on 3 Xilinx Virtex 2 Pro FPGA boards. Serial communication was emulated by MGTs. • Preliminary studies show that latency in transport is very high mainly due to high latency transceivers thus limiting application domain to GALS designs only. However, with the advancement in transceivers, this can be extended to pure synchronous designs as well. Dagstuhl Seminar
Conclusion • Logic / Interconnect unbalance in FPGAs makes the optimization of interconnection network important. • Significant intra-clock cycle idleness exists in FPGA routing wires. • Time-multiplexing increases resource utilization, and can potentially save area and achieve better timing. • Current FPGA interconnection network is not scalable. • On-chip network, consisting of switches and serial links, can improve scalability. • Promising preliminary results justify our approaches. Future work needs to thoroughly investigate the impact of architecture changes. Dagstuhl Seminar
Multi-FPGA or Multi-Core? FPGA Tile uP Tile FPGA Tile uP Tile FPGA Tile uP Tile NoC NoC FPGA Tile uP Tile FPGA Tile uP Tile FPGA Tile uP Tile • Building Multi-FPGA or Multi-Core will not be difficult with the development of semiconductor technology. • We (hardware engineers) know programming multi-FPGA more than programming multi-core processors. • Should we use VHDL/Verilog as the (intermediate) programming language for both Multi-FPGA or Multi-Core? Dagstuhl Seminar
Thank you ! Dagstuhl Seminar
See also • VPR v5.0.2 – Versatile Placement & Routing tool for heterogeneous FPGAs: http://www.eecg.utoronto.ca/vpr/ • Predictive Technology Model (PTM): http://www.eas.asu.edu/~ptm Dagstuhl Seminar