1 / 23

Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov

Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture. Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov Embedded and Re-configurable Systems Lab RYERSON University, CANADA. Example of Multi-task Data-Flow workload

fuller
Télécharger la présentation

Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture Lev Kirischian, Irina Terterian,Pil Woo Chun and Vadim Geurkov Embedded and Re-configurable Systems Lab RYERSON University, CANADA

  2. Example of Multi-task Data-Flow workload where each task can run in different modes Tasks Task 4: Mode 1 Mode 3 Mode 4 Mode 7 Task 3 Task 2: Mode 1 Task 2: Mode 2 Task 1: Mode 1 Mode 2 Mode 3 Time

  3. Usual Approach: Conventional Processors with Software-to-Task Optimization (Compilers +OS) Software-to-task optimization allows using conventional computing platforms with fixed architecture (Superscalar, VLIW, etc.) coupled with software compilers and OS. Limitations of the conventional processors If tasks are executed on sequential computing system – processing time often cannot fit specification requirements If tasks are executed on parallel computing system with fixed architecture – cost-effectiveness of these parallel computers strongly depend on the tasks algorithm or data structure

  4. Alternative Approach: Application Specific Processors (ASP) with Static Hardware-to-Task Optimization ASP allows reaching required cost-performance parameters because ASP-architecture is optimized on data-flow graph of the task and task data structure Limitations for the Application Specific Processors • Decrease of performance if task algorithm or data structure changes • Limited possibility for further modernization • High cost for multi-task or multi-mode custom computing systems

  5. Proposed Approach: Reconfigurable Processor with Dynamic Architecture-to-Task Optimization High-performance computing system for multi-task data-flow applications should contain two major components: 1. Dynamically Re-configurable Computing Platform based on partially-configurable FPGA devices to provide maximum possible hardware flexibility. 2. Library of Application Specific Virtual Processors (ASVP) – configuration bit-streams to program On-Chip Application Specific Processor’s circuitry for the period of time while Application (Task) is active.

  6. Architecture of Partially Reconfigurable FPGA devices (Xilinx “Virtex” Family) Configuration Data Files Internal Configuration SRAM In Out I / O Frame CLBs Frame # 1 Block RAM CLBs Frame # i Block RAM CLBs Frame # N I / O Frame Internal (Virtual BUS) CLB - Configurable Logic Block - Uniform Logic Element of a Frame, smallest individually configurable component in the FPGA

  7. Concept of Application Specific Virtual Processor (ASVP) • Application Specific Virtual Processor (ASVP) – • a group of logic resources dedicated and optimally configured to reflect the algorithm and data structure of the task. • ASVP is presented in a form of configuration data file (configuration bit-stream) to be downloaded into the FPGA when task should be activated

  8. Life-cycle of Application Specific Virtual Processor 1. ASVP-core downloads to the Reconfigurable platform before task activation 2. ASVP performs the task data processing as long as it is necessary without interruption or time sharing of dedicated logic resources with any other task 3. After task completion all resources included in the ASVP can be re-configured for any other task.

  9. ASVP Architecture-to-Task Optimization in Partially Reconfigurable FPGA FPGA Slots: 1 2 3 ... Data-Flow Graph FPGA X O R X O R + Virtual Hardware Component XOR Data In XOR XOR + Input Output Data Out Internal (Virtual) BUS

  10. Micro-architecture of a Virtual Hardware Component

  11. Virtual Hardware Component & Virtual Bus Interconnection Virtual Bus Virtual Hardware Component Boundary

  12. Micro-architecture of Application Specific Virtual Processor (ASVP) Micro-architecture of ASVP is based on Virtual Hardware Components interconnected via Virtual Bus lines

  13. Parallel Task Processing on the Dynamically Re-configurable Stream Processor (DRSP) Data out #2 Data out #3 Data in #2 ASVP1 for Task 1 ASVP 2 ASVP 3 Data out #1 I/O 1 I/O 2 I/O 3 I/O 4 Data in #1 FU 1 FU 2 FU 3 FU 4 RIM 1 RIM 2 RIM 3 RIM 4 Virtual Bus

  14. DRSP: System Level Architecture Host PC Data Stream Source Task Memory Task 1:{Afix+Amodes} …………………. Task h:{Afix+Amodes} PRCP-base Reconfigurable Functional Unit Afix i + … Cache Memory {Amodes i} P C I - Bus PCI-Interface Module Configuration & Data Bus RT-HOS Data Out

  15. Architecture of Reconfigurable Computing Module SPI 2 x 3.43 Gbit / S (12 bit*300 MHz) Input LVDS ports Real-Time Hardware Operating System Based on XCV50E Vertex FPGA 8.12 Gbit /S LVTTL BUS (64 bit x 133MHz) PCI Inter face 800 Mbit/S Reconfig. Functional Unit [ RFM 0111-002] Config.Files / Data Cache (4x512KB) SPI 2 x 3.43 Gbit / S (12 bit*300 MHz) Output LVDS Ports

  16. Reconfigurable Computing Module based on Xilinx “Virtex-E family of FPGA Devices

  17. Restoration of ASVP using spare CLB-column Column # 1 2 3 ... If hardware fault occurs the damaged Virtual Hardware Component can be relocated to the reserved CLB-column. AP i X O R X O R + + Input Output Communication Field

  18. When the proposed technology is most beneficial? • Workload consists of many tasks, where each task can run in different modes. • Each task requires high-speed data-stream processing • Task algorithmsmay be modified within life cycle of a system • Active tasks must run in parallel and should not be interrupted in any case when one of the tasks switches its mode or terminates. • System can be remotely or self-restored even if some hardware fault occurs

  19. DRSP Application for Networked Intelligent Manufacturing Systems High performance parallel data-stream processing (up to thousands of billions operations / sec.) of big volume of data (up to hundreds of Giga bits) for: a) Complex image processing and image recognition, b)Spectrum analysis and digital signal processing, c)Data transmission via LAN with data compression / decompression and encryption / decryption, d)Control of high performance manufacturing equipment and robotic systems.

  20. 25 20 Acceleration 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Number of CLB-slots in Virtual Component Acceleration of Task / Mode Switching Acceleration of task or mode switching comparing with Entire FPGA-based system increases when number of CLB-columns in ASVP is minimal and can be over that 20 times faster

  21. Minimization of Hardware Resources Minimization of Logic resources in DRSP approach Comparing with entire FPGA-based systems: When number of tasks and task modes increases in a workload, respectively increases the cost-effectiveness of DRSP

  22. SUMMARY: RDSP Comparing with Conventional CPU, DSP or ASP Platforms DRSP Conv. CPU DSP ASP Performance Flexibility Reliability Much lower than DRSP Lower than DRSP Much lower than DRSP Somewhat higher None, or very little Lower than DRSP Much lower than DRSP Much lower than DRSP Lower than DRSP

  23. Thank you

More Related