1 / 27

Reconfigurable Computing

Reconfigurable Computing. Jongeun Lee. Fall 2013. Part I: FPGA. FPGA. Flexibility + parallelism Spatial computing FPGA mapping flow. Logic Elements of FPGA. Look up table 3-LUT, 4-LUT, .. Logic block (function block). FPGA architecture. 2D array of logic blocks. Interconnect.

starr
Télécharger la présentation

Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing Jongeun Lee Fall 2013

  2. Part I: FPGA

  3. FPGA • Flexibility + parallelism • Spatial computing • FPGA mapping flow

  4. Logic Elements of FPGA • Look up table • 3-LUT, 4-LUT, .. • Logic block (function block)

  5. FPGA architecture • 2D array of logic blocks

  6. Interconnect • Nearest neighbor • More complex routing structure • Connect block • Switch box

  7. More efficient interconnects • Longer-length wires • Hierarchical

  8. Extended logic elements • Fast carry chain • Eg. simple 4-bit full adder • Multiplier • RAM • Processor blocks

  9. Configuration • SRAM • Fast, infinite reconfiguration • Power (leakage), volatile, large cell, • requires extra storage or hardware to program at boot-up • Flash memory • Nonvolatile, smaller cell, lower static power • Limited write cycle lifetime, slower write speed, requires charge pumps on chip • Antifuse • Very small, very low prop delay, no static power, immune to soft error • One time programmable

  10. Altera Stratix • Logic architecture • Logic Element (LE) • Logic Array Block (LAB) 1 LAB = 10 LEs, carry chains, control signals, local interconnection

  11. Altera Stratix • Interconnect • Hierarchical: local (within LAB) + neighboring blocks + general (horiz./vertical channels) • RAM blocks • M512: 32x18-bit • M4K: 128x36-bit • M-RAM: 4Kx144-bit • Configurable as: 1-port, 2-port, shift-register, FIFO, ROM table • May have parity bits, registered inputs/outputs • DSP blocks • One 36x36-bit, four 18x18-bit, or eight 9x9-bit mult. (+accum.)

  12. Altera Stratix • Routing architecture • MultiTrack: R4, R8, R24, C4, C8, C16 • Direct connection

  13. Xilinx Virtex-II Pro • Logic architecture • CLB = 4 slices + 2 tri-buffers • Slice = two 4-LUTs, 2 regs, carry logic, wide-function muxes, gates • RAM: block SelectRAM+ • 18Kb, true 2-port • Multipliers • 18x18-bit mult. • CPU: PowerPC 405-D5 • 300 MHz

  14. Xilinx Virtex-II Pro • Routing architecture • Segmented, hierarchical • 24 long lines that span the full height and width of the device • 120 hex lines that route to every third or sixth block away in all four directions • 40 double lines that route to every first or second block away in all four directions • 16 direct connect routes that route to all immediate neighbors • 8 fast-connect lines in each CLB that connect LUT inputs and outputs

  15. Part II: Architectures

  16. Questions • What is the appropriate granularity for the reconfigurable fabric? • Should the reconfigurable fabric be instantiated as a separate coprocessor or integrated as a functional unit?

  17. Terminology • RPF (reconfigurable processing fabric) • Static vs. dynamic • Kernels (= virtual instruction configurations, VICs) • Fine-grained vs. coarse-grained • Tight-coupling vs. loose-coupling

  18. Garp’s nonsymmetrical RPF • One row has • One control PE • Communicate with ext (irq, memory) • 23 logic PEs • 2-bit granularity • Limited wire network • Configuration • 6,144 bytes (for 32 rows) • 384 words on 128b bus • Partial and dynamic • Compiler

  19. PipeRench • Configuration • pipelined • partial and dynamic • virtual pipeline stages vs. physical pipe stages • Architecture • cyclic dep allowed within row (= stage) • full crossbar bet’n stages • well suited for stream processing • Programming • dataflow intermediate language (DIL)

  20. Virtual pipeline stages • virtual pipeline stages of an application • light gray blocks -- configuration of pipeline stage • dark gray blocks -- execution • mapping of virtual pipeline stages to physical pipeline stages • shown left: physical pipeline stages • labeled: virtual pipeline stage number

  21. RPF integration functional unit • many ways exist • no agreement on terminology • tightly vs loosely is only relative tightly coupled? loosely coupled?

  22. Functional unit • RFU • just another FU • extends ISA • e.g. • Chimaera • PRISC

  23. Coprocessor • RaPiD • independent or integrated • more loosely coupled • Chameleon’s RCP • PPF can access DMA, processor, programmable I/O

  24. RaPiD

  25. Hybrid type • ADRES

  26. Implementation results Processor area breakdown Power breakdown in acceleration mode

  27. How they fare? • lack of significant market success to date • reconfigurable computing is still an area of significant ongoing research and commercial interest • For example, Rapport Inc.'s Kilocore design is a commercial derivative of the PipeRencharchitecture • As of 2007, Rapport was offering 256 PE components organized as 16 stripes, each composed of 16 8-bit PEs, and it has plans to expand its offerings to components containing thousands of PEs. • SRP, a derivative of ADRES, is included & used in Samsung’s AP

More Related