1 / 26

IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013

Riccardo Cattaneo, Christian Pilato , Gianluca C. Durelli , Marco D. Santambrogio and Donatella Sciuto Politecnico di Milano, Italy. IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013.

piera
Télécharger la présentation

IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Riccardo Cattaneo, Christian Pilato, Gianluca C. Durelli, Marco D. Santambrogio and Donatella Sciuto Politecnico di Milano, Italy IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013 SMASH: A Heuristic Methodology for Designing Partially Reconfigurable MPSoCs

  2. What is an FPGA? • Hardware devicethat can be customizedafter the fabrication to execute a specific functionality • Distinct hardware blocks are “intrinsically” running in parallel on the device • Heterogeneous grid of interconnected components • look-up tables (LUTs), block rams (BRAMs), digital signal processors (DSPs), switch matrices, input/output blocks (IOBs) etc… • Possibility to reuse resources by reconfiguring part of the logic at run time (partial reconfiguration)

  3. Heterogeneous SoCs with FPGAs AVNet ZedBoard (Zynq7000-based dev board) Coarse Grain overview of Zynq7000 All-Programmable SoC • Highly coupled heterogeneous systems • Zynq Platform: ARM Dual-Cortex A9 cores tightly coupled with a Xilinx Artix-7 FPGA • High speed, low latency reconfigurable interconnect

  4. Design Challenges and Motivation INPUT SMASH The steps are strictly interdependent! • Hardware engineer needs to: • partition the application in blocks (partitioning) • determine which parts are better to be executed in hardware (mapping and scheduling) • generate the systems (architecture refinement) • Partial reconfiguration allows reusing the same logic across different tasks • More tasks can be ported in hardware • Significant overhead to be taken into account

  5. SMASH: Proposed Methodology • Design Space Exploration • determines the propermapping and scheduling • Architecture Refinement • customizes the architecturaltemplate to derive the corresponding platform

  6. Mapping and Scheduling • Output: • Implementation and component for each task • Order of execution Input: • Task graph (DAG) • Architectural Template • Identifies resources constraints • Implementations • List of different trade-offs in termsof performance and resources

  7. Implementation vs. Component • Each task can have multiple alternative implementations on the same component • Faster tasks usually require more resources • Some tasks can share implementations to execute the same functionality multiple times • Hardware reuse: no reconfiguration is required • Implementation is more related to functionality and resources • Component is more related to where the task is actually executed • Processor or hardware module

  8. SMASH: Execution Overview SMASH iteration Evaluate metrics Generatetrace Schedule trace Store solution Termination? No Yes Return best solution Simultaneous MApping and Scheduling Heuristic

  9. Exploring Mapping and Scheduling • Exploration based on the Serial Generation Scheme (SGS) • Constructive approach to better handle design constraints • Decision is not taken if it would lead to a constraint violation • Different combinations of mapping and scheduling • Each decision represents a mapping of a task with respect to an implementation and a processing element • The order of selection represents the priority values for resolving scheduling conflicts on the resources

  10. Ant Colony Optimization • Our proposed approach is based on Ant Colony Optimization (ACO)to limit unfeasible solutions • Cooperative behavior of the ants while searching • The ant has different possibilities at each step and takes stochastic decisions, composing a trace • Stochastic principles guarantee exploration (a probability is generated for each admissible decision at each step) • Feed-backsguarantee the exploitation of good parts of the solutions

  11. Algorithm Overview Exploration: generating trace Mapping decision Exploitation: updating global information Pseudo-code of the proposed ACO-based exploration:

  12. Stochastic Selection Process global heuristic local heuristic There is always the possibility of adding a new PE or reusing an existing one (platform customization) • At each decision point d, the probability to assign a candidate j (task/communication) to a proper implementation pointi (implementation+processing element) is: • Global information G: feedback information • Probability that the decision leads to a good solution • Local heuristic L: problem-specific hint • “Adjusted” by the global heuristic if wrong • Roulette wheel and extraction of a combination i, j • Probability is generated iff the resources required by the resulting PEs can be satisfied by the architecture

  13. More about SMASH • Simultaneous MApping and Scheduling Heuristic SMASH iteration Evaluate metrics Generatetrace Schedule trace Store solution Termination? No Yes Return best solution 13

  14. Trace Generation and Evaluation • Evaluation is performed only on the complete trace • Updated version of the original TG augmented with communications and reconfigurations • Reconfiguration is taken into account from the early stages of the design process • Possibility to include different evaluation methods • Analytical estimations vs. TLM simulations • Decisions composing the best solution are reinforced • As the time goes, the best trace is identified

  15. Scheduling Definition Input • Task graph (DAG) • Trace: ordered list of mapping decisions (task-component-implementation) Output • Start/end time estimations for each task Goal • Reduce total execution time

  16. Scheduling: Methodology Overview SMASH scheduler Task graph and trace Extended task graph Metrics Create extended task graph Actual scheduling (assign times) Evaluate Metrics

  17. Extended TG: Communications Adding explicit tasks based on the communication topology

  18. Extended TG: Reconfigurations • A reconfiguration task is introduced iff: • Two processing tasks are mapped on the same component and • Their implementations are different, i.e., module cannot be reused • Insertion of a reconfiguration task: • New edges are introduced from all WRITEs exiting the source processing task to the reconfiguration • New edges are introduced from the reconfiguration to all the READs entering the target processing task

  19. Extended TG: Reconfigurations

  20. Trace Evaluation Possibility to integrate different policies to generate the corresponding scheduling

  21. Architecture Refinement • Actual platform instance is derived based on the resulting decisions • Hardware modules with only one task assigned are converted into static IP blocks • Hardware modules with more tasks assigned are represented as reconfigurable regions • Integration with the generation of the run time manager to manage reconfigurations • Still work in progress and manually performed

  22. Experimental Evaluation • Synthetic benchmarks (TGFF) • Focus on scalability of the approach • Possibility to evaluate different task graph patterns • Resulting systems (platform instance and extended task graph with mapping/scheduling decisions) converted into virtual platforms • Validation of the different solutions assuming correctness of the execution • Simulations performed with Synopsys Platform Architect • VPU performance annotations extracted from tasks’ implementations

  23. Experimental Setup • Three different class of experiments: • Static: FPGA area is divided into a set of up to KS static IP cores (no partial reconfiguration) • Mixed: both IP cores and reconfigurable regions can be used, with an upper bound of KM IPs and RM reconfigurable regions. • Reconfigurable: architectures with no more than KR regions • Reconfigurable regions can be also deployed as static cores in the final architecture if only one task is assigned to them

  24. Experimental Results Small task graphs cannot benefit of reconfiguration Large task graphs are affected by communication overhead

  25. Conclusions and Future Work • SMASH is an automated methodology to design reconfigurable systems • It determines the mapping and scheduling of the different tasks • It allows customizing the architectural template • Future work • Integration of floorplanning procedures to compuate and validate physical constraints of the blocks • Automatic generation of the platform specification

  26. End… http://www.fp7-faster.eu/

More Related