1 / 93

ENG6530 Reconfigurable Computing Systems

ENG6530 Reconfigurable Computing Systems. Reconfigurable Architectures. Topics. Coupling of RCS Systems Implementation Approaches Advantages/Disadvantages Mapping Large Designs? Parallel, Serial, Semi-Serial Floating Point, Fixed Point Run Time Reconfigurations

lois-pratt
Télécharger la présentation

ENG6530 Reconfigurable Computing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENG6530 Reconfigurable Computing Systems Reconfigurable Architectures ENG6530 RCS

  2. Topics • Coupling of RCS Systems • Implementation Approaches • Advantages/Disadvantages • Mapping Large Designs? • Parallel, Serial, Semi-Serial • Floating Point, Fixed Point • Run Time Reconfigurations • Static vs. Dynamic Reconfiguration • Support for RTR ENG6530 RCS

  3. References • “FPGA-Based System Design”, by Wayne Wolf • “Reconfigurable Computing: Accelerating Computation with FPGAs”, Maya Gokhale, 2005. • “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications”, by C. Bobda, 2007. • “Reconfigurable System Design and Verification”, by Pao-Ann Hsiung, 2009. • “Reconfigurable Computing: A Survey of Systems and Software”, Scott Hauck, 2002. ENG6530 RCS

  4. Reconfigurable Computing: Definition • Reconfigurable Computing (RC) is a computing paradigm • where programmable logic devices are used to accelerate computations or applications by exploiting parallelism at different levels (bit, instruction level, architectural) • in which Algorithms are implemented as a temporally and spatially ordered set of very complex tasks. • What is meant by temporal and spatial implementations? ENG6530 RCS

  5. Spatial vs. Temporal Computing Temporal Spatial ENG6530 RCS

  6. Spatial & Temporal Definitions • There are several perspective on the meaning of spatial and temporal • Cluster of Microprocessors • Temporal = Function runs within one processor • Spatial = Function spread across many microprocessor nodes • Traditional Embedded Computing Hardware • Temporal = using Microprocessor • Spatial = Implementing dedicated ASIC accelerators • FPGAs • Temporal = using same logic resources for multiple Functions • Spatial = Parallelizing and pipelining a function across the FPGA Fabric ENG6530 RCS

  7. Spatially Programmed Connections • “Hardware” customized to specifics of problem. • Direct map of problem specific • datapath, • control. • Circuits “adapt” as problem requirements change. ENG6530 RCS

  8. How RC Enhances the performance? • Performance enhancement achieved by hardware execution itself (overcoming the following limitations): • The overhead of software execution (Instruction fetch, data load to registers, and etc.) • The overhead of using fixed size data. • The overhead of sequencing (i.e., using branches). However, these benefits are not so large, for embedded CPU and DSP are highly optimized. The key of performance improvement is: Pipelining/Parallel processing

  9. Issues in Configurable Design • Reconfigurable Hardware Architecture (FPGA) • Choice and granularity of computational elements • Issues related to performance, area and power consumption • Design Entry Techniques • Low Level (VHDL) • High Level (ESL e.g. Handel-C) • Support of efficient CAD tools • High Level Synthesis, Logic Optimization, • Mapping, Place & Route • Coupling Approaches • Tightly coupled vs. loosely coupled • Area versus Performance • Serial, semi-parallel, parallel • Floating Point, Fixed Point • Reconfiguration time and rate • Static versus dynamic reconfiguration (area, performance, approaches)

  10. Coupling Approaches for Reconfigurable Hardware (RH) RH can be coupled to GP as: • A functional unit (Tight Coupling) • A Co-processor • An Attached processing unit • A Standalone processing unit (Loosely coupled) ENG6530 RCS

  11. Different levels of coupling Loosely Coupled Workstation Attached Processing Unit Coprocessor Standalone Processing Unit Tightly Coupled I/O Interface CPU Memory Caches FU

  12. 1. Functional Unit • Part of the data-path of a host machine • Examples • Chimaerea (Hauck97a) • XiRisc Architecture • Tensilica ASIP CPU Memory Caches FU ENG6530 RCS

  13. Fetch Decode Issue Integer Unit FP Unit Branch Unit LD/ST Unit Reconfigurable Unit Functional Unit • Features: • Customized instructions may change over time • Registers hold input/output • RU is a functional Unit Reconfigurable Instruction Set Processors ENG6530 RCS

  14. Example of RPU integrated into CPU ENG6530 RCS

  15. Architecture • Duplicated instruction decode logic (2 symmetrical data- channels) • Duplicated commonly used function Units (Alu and Shifter) • All others function units are shared (DSP operations, Memory handler) • A tightly coupledpipelined configurable Gate Array

  16. PiCoGA PiCoGA: a Pipelined ConfigurableGate Array • Embedded function unit for dynamic extension of the Instruction Set • Two-dimensional array of LUT-based Reconfigurable Logic Cells • Each row implements a possible stage of a customized pipeline, independentand concurrent with the processor • Up to 4x32-bit input data and up to 2x32-bit output data from/to register File

  17. Tensilica Xtensa Processor • Tensilica’s Xtensa processors are synthesizable processors that are configurable and extensible.! ENG6530 RCS

  18. Tensilica Xtensa Architecture ENG6530 RCS

  19. Automated Design Process ENG6530 RCS

  20. XPRES Compiler ENG6530 RCS

  21. 2. Coprocessor • As a Coprocessor: • No sharing of data path of GPP CPU • Without constant supervision of the GPP • Similar to a Floating Point Unit (FPU) • Might share cache/memory • GPP initializes the RH • Independent parallel computation • More communication overhead • Several cycles Coprocessor CPU ENG6530 RCS

  22. Coprocessor: Garp Architecture For general purpose loop acceleration Loop is extracted with a compiler, and converted to hardware ENG6530 RCS

  23. Coprocessor Design: Cray XR1 • A Cray XR1 reconfigurable blade has two nodes, consisting of a single AMD Opteron processor coupled with two RPUs • This connection is made directly with HyperTransport. • This delivers low latency and high bandwidth communication between the processing elements. • Offers users orders of magnitude speedup on select applications. • Many Xilinx Virtex -4 FPGAs can be integrated into a single system and applied effectively against demanding problems. Cray XR1 blade: 2x AMD Opteron + 2x Virtex LX200 FPGAs ENG6530 RCS Cray XT5h Supercomputer

  24. 3. Attached Processing Unit • Behaves as an additional processor • Independent Computation • Higher delay to communicate with CPU • DMA-type overlap • No sharing of Cache Attached Processing Unit CPU Memory Caches ENG6530 RCS

  25. Attached RPU • Similar to a multiprocessor environment • Allow transfer and computation of large amount of data • Communications with CPU => via memory ENG6530 RCS

  26. Attached RPU: Zynq-7000 ENG6530 RCS

  27. Zynq-7000 AP ENG6530 RCS

  28. 4. As a Standalone • The most loosely coupled to GP. • Infrequent Communication with the GP. • Independent computation for long time. • Communication is expensive!! I/O Interface CPU Memory Caches Standalone Processing Unit ENG6530 RCS

  29. SYSTEM LEVEL VIEW of the SPLASH 2 ARCHITECTURE interface board:1.connects Splash 2 to the host 2.Extends the address and data buses Processing Element (PE): Each PE has 512 KB of memory The host can read/write this memory PE X0:controls the data flow into the processor board PEs (X1-X16) Splash 2 Processing Board The Sun host can read/write to memories and memory mapped control registers of Splash 2 via these buses. ENG6530 RCS

  30. Pros/Cons of Coupling Approaches • The tight integration • Less communication overhead • RH can not operate “alone” for short period of time • Amount of Reconfig. Logic is limited • The loose integration • Greater parallelism • Greater independence • RH can not operate “alone” for long period of time • Higher and more expensive communication overhead ENG6530 RCS

  31. Benefits/Drawbacks of the coupling + Dependency Reconfiguration speed Functional Unit Coprocessing Unit Attached RPU Communication overhead Amount of logic capacity Size Maintainability Standalone RPU + ENG6530 RCS

  32. Summary • Degree of coupling plays an important role in terms of • Performance and cost. • Communication overhead, • Maintenance and reconfiguration speed • Several architectures have been proposed in academia and industry • New tools are required to aid the designer in exploring the design space and choose among the different coupling approaches for their specific application. ENG6530 RCS

  33. Issues with Reconfigurable Computing ENG6530 RCS

  34. How to manage large Designs? • Use the largest FPGA available. • Use multiple FPGAs to accommodate the entire design. • Optimize your design and the synthesis and place & route. • Customize your architecture to fit in the available FPGA: • use serial implementations or • semi-parallel implementations. ENG6530 RCS

  35. × × × × + + + + + + FPGAs: Space/Speed Trade-offs How can we make this more area efficient yet still achieve performance? A Q = (A x B) + (C x D) + (E x F) + (G x H) can be implemented in parallel B C D Q E F G H ENG6530 RCS

  36. Cont … Example: Semi Parallel • Y = (A * B) + (C *D) + (E * F) + (G * H); Can we make this more area efficient ? ENG6530 RCS

  37. Cont … Example: Serial • Y = (A * B) + (C *D) + (E * F) + (G * H); ENG6530 RCS

  38. × × × × × × × D Q + + + + + + + + + + + + Customize Architectures to Suit your Ideal Algorithms FPGAs allow Area (cost) / Performance tradeoffs Parallel Semi-Parallel Serial D Q Speed Optimized for? Area

  39. Floating Point vs. Fixed Point ENG6530 RCS

  40. Floating Point Representation • Floating-point arithmetic is sufficiently widespread in scientific computing, DSP Applications, Machine Learning, Communication Systems, Optimization, …. • Floating-point arithmetic is widely usedbecause it has many practical advantages ?? • It provides a familiar approximation to the real numbers, with useful properties like automatic scaling • It is widely available on different computers and is well supported by programming languages • Current workstations have highly optimized native floating-point arithmetic, sometimes faster than native integer arithmetic • Single Precession vs. Double Precession. ENG6530 RCS

  41. S: sign of mantissa Range (roughly) Single: 10-38 to 1038 Double: 10-307 to 10307 Precision (roughly) Single: 7 significant decimal digits Double: 15 significant decimal digits FP Number Representation Mantissa x RExponent 5.234 x 10-28 ENG6530 RCS

  42. sign integer bits fractional bits register width: RW = 1 + IB + FB (typically 16 or 32) Fixed-Point Arithmetic IB FB Example (RW=9, IB=FB=4) 0011 00112 = 1011.01112 = 3.187510 • Uses integers to represent fractional numbers: • Operations • Dynamic range: • -2IB ... 2IB-1 • much smaller than in floating-point risk of overflow • Problem: for a given application, choose IB (and thus FB) to avoid overflow • Any tools to automatically choose, application dependent, “best” IB (and thus FB) for linear DSP kernels? a·b »fb a+b multiplication addition ENG6530 RCS

  43. Fixed Point ADVANTAGE: • and with integer is much quickerthan floating point additional and multiplication. • The hardware is less complex, • The hardware is cheaper, and • The hardware requires less power.. DISADVANTAGES: • DSP algorithms require fractional numbers (Complex for the developer) • Rounding 0.03 to 0 will cause your filter to fail. ENG6530 RCS

  44. Addition Units: Some Trade-offs Floating-point vs. Fixed-point • Area : 7x-15x • Speed: 0.8x-1x • Power: 5x-10x ENG6530 RCS

  45. Fixed Point or Floating Point? Fixed Point • Very fast when base 2 • No complicated logic • Radix point not encoded • Fixed Accuracy • Can only represent small number set Floating Point - Slower • Accuracy Varies • Represent very large number set • Radix point encoded • Complex logic required

  46. Conversion Programs Development Procedure Floating-Point C Program Range Estimator Floating-Point to Fixed-Point C Program Converter Range Estimation C Program Manual specification Execution Fixed-Point C Program IWL information

  47. Static and Dynamic Reconfiguration ENG6530 RCS

  48. Reconfigurability • Reconfiguration is either static (execution is interrupted), semi-static (also called time-shared) or dynamic (in parallel with execution): • Static configuration involves hardware changes at the slow rate of days/weeks, typically used by hardware engineers to: • Evaluate prototype chip implementations, • Implement an architecture on an entire FPGA fabric or multiple FPGAs. • Semi-Static If an application can be pipelined, it might be possible to implement each phase in sequence on the reconfigurable hardware. • The switch between the phases is on command: a single FPGA performs a series of tasks in rapid succession, reconfiguring itself between each one. • Such designs operate the chip in a time-sharing mode and swap between successive configurations rapidly. • The dynamic reconfiguration: most powerful form of reconfigurable computing. • The hardware reconfigures itself on the fly as it executes a task. • While some modules are executing others might be swapped in/out. ENG6530 RCS

  49. Static Implementation • Static or Compile Time Reconfiguration (CTR) • Static implementation strategy • Single system wide configuration • Configuration doesn’t change during computation • Similar to using ASIC for application acceleration CONFIGURE EXECUTE ENG6530 RCS

  50. Compile Time Configuration • Compile time configuration is an important feature in SRAM-based FPGAs that allows changes in functionality according to need. • Enables benefits such as flexibility, hardware reuse, and reduced power consumption • Drawbacks of compile-time reconfiguration • Entire fabric is reconfigured even for slight design changes • System execution stalls completely • Time to load a design onto the fabric from external memory (reconfiguration time) increases with bitstream size Flexibility Designs loaded when required Hardware Reuse Current required design replaces old one on the same fabric Design C Design B Design A Design A Power Savings Design A, B, & C stored in external memory Configuration controller Design B Design C Design C Required Design B Required Design A Required External memory FPGA Fabric ENG6530 RCS 50

More Related