260 likes | 390 Vues
ERSA 2008 Las Vegas, NV July 14–17, 2008. Design Framework for Partial Run-Time FPGA Reconfiguration. Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida. Outline. Introduction
E N D
ERSA 2008 Las Vegas, NV July 14–17, 2008 Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida
Outline • Introduction • Partial Reconfiguration (PR) Overview • Proposed Design Methodologies • Framework analysis • Conclusions
Does’nt fit Module A Module B Module A Module C Module A Module A Module C Module C Module A Module A Module C Module C Module C Module B Module B Module B Module B Module B Module B Introduction – Fully reconfigurable systems Battery FPGA Config 1 Configuration lines disabled disabled enabled System controller General purpose I/O Config 2 disabled enabled Bitstreams storage disabled Required design Shared memory External I/O Config 3 Config 1 Request Config 2 Request 1. Device too small for complex designs 2. Big full bitstreams (long reconfiguration time) 3. Complete system operation is halted prior to reconfiguration Design station
Introduction – The Virtex 4 PR architecture • Newer Xilinx FPGA families offer partial reconfiguration feature • A rectangular region of the FPGA can be reconfigured without affecting the remaining FPGA area • System can continue operating without interruption ) Reconfigurable region 1 Reconfigurable region 2
Module A ICAP Module C disabled Module A Module A Controller (Microblaze) Flash controller Module B disabled Module B Module B Module C Module C Introduction – A sample PR architecture Battery FPGA disabled enabled JTAG Base system configuration Bitstreams storage enabled External I/O Reconfigurable area Static area Module A request 1. System controller does not need to be placed in an external device 2. Access to fast Internal Configuration Access Port (ICAP – 32 bits, 100 MHz) 3. Smaller partial bitstreams 4. No need to halt complete system when reconfiguring a module 5. Time multiplexing of FPGA resources, load and unload HW modules on demand
Module A ICAP Module C Controller (Microblaze) Flash controller Module B Modules: A and B PRR 1 Static modules PRR 2 Modules: C Introduction – Current PR Design Flow • Steps • Partition the system into modules • Define static modules and reconfigurable modules • Decide the number of PR regions (PRRs) • Decide PRR sizes, shapes and locations • Map modules to PRRs • Define PRR interfaces, instantiate slice macros for PRR interfaces • Optimization problems • Design partitioning • Number of PRRs • PRR sizes, shapes and locations • Mapping PRMs to PRRs • Type and placement of PRR interfaces Design partitioning Design floorplanning and budgeting Static modules Reconfigurable Modules (PRMs) FPGA Static region 2 # of PRRs? 1
Introduction – Early Access PR Design Flow • Introduced by Xilinx in FPL’06 • Major improvements: • Automatic implementation scripts • Rectangular regions (not full column reconfiguration) • Static nets can cross reconfigurable regions • Slice macros replace bus macros • Partitioning and floorplanning steps are manually executed • Design guidelines for these steps are not provided Placement and PRRs constraints Reconfigurable design specifications PRM Bitstreams Xilinx PR Implementation Flow Design floorplanning and budgeting Design partitioning (manual) Full Initial Bistream (automatic) Potential for development of automatic CAD tools
Introduction – Current PR design tools limitations • PR design is a very specialized task • Only a physical level of support is provided • Architectural knowledge of the target device is a must • Not very flexible, many design constraints • Partitioning and floorplanning steps are manually executed • No performance sensitive design guidelines are provided • No automatic heuristics based design flow is available too • Lack of abstraction from low level details discourages designers from using PR • Difficult for many end users In this work, we will propose a taxonomy of PR systems design flows and a efficient methodology for each type.
PR Overview – Taxonomy of PR systems design flows PR System Design Flow Multipurpose Special purpose • Highly specialized systems design • All PRMs that will exist on the system are known at design time • Each PRR is independently optimized (size, shape, location, interface) based on the PRMs that will be mapped to it • Output is: • Floorplan defining a static region and a set of optimized PRRs • The set of PRMs that can be placed in each PRR (PRMs to PRRs mapping) • Not optimized for a specific application • PRMs required by the application are not known when designing the base system • Goal is to design a flexible and reusable base design that can be used for several different PR systems • Base system designer defines a set of PRRs with fixed shapes, sizes, locations and interfaces • Generated floorplan is used as input template for the PRMs implementation
Proposed Design Methodology: Special-Purpose • Partition the system into several hardware modules • Synthesize the hardware modules • Use a control flow graph (CFG) and a states table to represent: • Application states and the transitions between them (execution path coverage) • Set of modules required in each application state Let’s see an example
Proposed Design Methodology: Special-Purpose • Define region partitioning constraints S3 S2 C F S1 G S4 D S5 E Establishing constraints Reconfigurable Static 1. A, B are present in all states (static modules) 2. C, F, G and D are reconfigurable modules (PRMs) 3. F and G are mutually exclusive with respect to C (they can not be placed in the same PRR than C) 4. F, G, D and E can be placed in the same PRR 5. C, D and E can be placed in the same PRR
2 Proposed Design Methodology: Special-Purpose • Define the number of PRRs to be used • Optimization variable • Number is computed based on CFG and states table 1 ? 4 ? # PRRs = • Define a PRMs to PRRs mapping • Optimization problem • Combinatorial design space • Design space is reduced usign design constraints Static Region: PRR 1: PRR 2: A, B C, D, E F, G Possible solution (not necessarily the optimal)
Proposed Design Methodology: Special-Purpose • And when do we size our PRRs? • Don’t worry, it is our next step Module A Module B Required static region resources (Resources are added) Module C Module D Modules profile Required PRR 1 Resources (Maximum of each resource type) Module E Module F Slices BRAMs DSP48s Required PRR 2 Resources (Maximum of each resource type) Module G
Proposed Design Methodology: Special-Purpose • Define the PRR sizes, shapes, locations inside the FPGA fabric • Floorplanning optimization problem • Proper metrics for PRR performance analysis are required • Design guidelines for efficient PRR floorplanning are also a necessity PRR 1 Resources PRR1 Static region Final optimized custom base system floorplan PRR 2 Resources PRR2 FPGA • Define PRR interfaces • Place slice macros Reconfigurable region with enough resources for PRR1 We do the same for PRR2
Proposed Design Methodology: Special-Purpose • Methodology outputs Custom base system PRMs to PRRs mapping • They are used as input files for the automatic Xilinx PR Design Flow
Proposed Design Methodology: Special-Purpose • Opportunity to automate this flow through design tools • Optimization variables • Number of PRRs • PRRs sizes, shapes, and locations • PRMs to PRRs mapping • Other additional optimization variables can be defined • Several possible cost functions: • Area wastage • Power usage • Application latency • Throughput • …
Framework analysis – PRR Geometries • PR system design flows require: • Proper metrics for PRR performance analysis • Design guidelines for efficient PRR floorplanning • Study of the effects of varying PRR shape over • Maximum Clock Frequency • Partial Bitstream Size • Five separate test cores: • Beamforming (DSP/slice) • CFAR (slice/memory) • AES (register) • ARM7 softcore (hybrid) • Sine/Cosine LUT (memory) • Performed on V4SX55 thus far Aspect ratio = PRR Height / PRR Width
Framework analysis – Beamforming (~125 MHz, 40%) • 5022 slices • 16 DSP48s • 17 RAMB16s • Baseline, non-PR performance = 1614 kB, 127.845 MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio Aspect ratio
Framework analysis – CFAR (~100 MHz, 16%) • 2610 slices • 2 DSP48s • 34 RAMB16s • Baseline, non-PR performance = 1001 kB, 103.616 MHz Clock frequency (MHz) Bitstream size (kB) Aspect ratio Aspect ratio
Framework analysis – AES (~80 MHz, 13.75%) • 3634 slices • 3943 registers • 4 RAMB16s • Baseline, non-PR performance = 1393 kB, 80.483 MHz Bitstream size (kB) Clock frequency (MHz) Aspect ratio Aspect ratio
Framework analysis – ARM7 (~40 MHz, 6.8%) • 1826 slices • 16 DSP48s • 10 RAMB16s • Baseline, non-PR performance = 872 kB, 40.985 MHz Bitstream size (kB) Clock frequency (MHz) Aspect ratio Aspect ratio
Framework analysis – Sine/Cosine LUT • 107 slices • 27 RAMB16s • Baseline, non-PR performance = 571 kB, 204.918 MHz Bitstream size (kB) Clock frequency (MHz) Aspect ratio Aspect ratio
Framework analysis – PRR Geometries • Slice-intensive designs show best bitstream size/clock frequency performance with aspect ratio around 2-4 • Roughly equivalent to aspect ratio of the FPGA as a whole • Non-slice intensive designs show best bitstream performance with aspect ratio >> 4 • Due to columnar distribution of RAMB16/DSP48 resources on chip • Clock frequency relatively insensitive to aspect ratio • Not shown in graph: resource wastage also improved • Results are more pronounced for high frequency designs • However, aspect ratio not the only design consideration • Placement on a chip relative to other regions, pins, or resources may affect (restrict) choice of PRR shape
Conclusions - Contributions of this work • Taxonomy for PR systems design flows and a design methodology for efficient development of each type • Identification of relevant optimization variables and constraints • Number of PRRs, optimal mapping of PRMs to PRRs, system floorplanning • Propose their incorporation in a future automatic design tool • Study of the effects of varying PRR shape • Maximum Clock Frequency • Partial Bitstream Size • Multiple classes of cores/designs • Memory-intensive • DSP-intensive • Combinational Logic-intensive • Register-intensive • Etc. • PRR floorplanning guidelines definitions and delivery