1 / 25

Optimization of Parallel Task Execution on Adaptive Reconfigurable Group Organized Computing System

This presentation explores the application of parallel computing systems for data-flow tasks and the optimization of task execution using a reconfigurable group processor architecture. Topics include digital signal processing, high-performance control and data acquisition, digital communication and broadcasting, cryptography and data security, and process modeling and simulation.

detwiler
Télécharger la présentation

Optimization of Parallel Task Execution on Adaptive Reconfigurable Group Organized Computing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical and Computer Engineering RYERSON Polytechnic University Toronto, Ontario, CANADA

  2. Application of parallel computing systems for data-flow tasks • Digital signal processing (DSP); • High performance control & Data acquisition; • Digital communication and broadcasting; • Cryptography and data security; • Process modeling and simulation.

  3. Presentation of a data-flow task in the form of a data-flow graph Data In MO 1 - MO n - Macro-operators, e.g. digital filtering, FFT, matrix scaling, etc. Data Out

  4. Correspondence between task and • computing system architecture • If the data-flow task is processed on conventional SISD architecture – processing time often cannot satisfy specification requirements; • If the task is processed on SIMD or MIMD architectures - cost-effectiveness of these parallel computers strongly depend on the task algorithm or data structure. • One of possible solutions to reach required cost-performance requirements is to develop a custom computing system where architecture “covers” data-flow graph of the task.

  5. Limitations for the custom computing systems with fixed architecture • 1. Decrease of performance if task algorithm or data structure changes • 2. No possibility for further modernization • 3. High cost for multi-task or multi-mode custom computing systems.

  6. One of possible solutions – • Reconfigurable parallel computing systems • 1. Ability for custom configuration of each processing • (functional) unit for a specific macro-operator • 2. Ability for custom configuration of information links • between functional units; • The above features allow hardware customization for any data-flow graph and reconfiguration when task processing is completed.

  7. Example of FPGA-based system with architecture configured for the data-flow task

  8. Concept of Group Processor in the reconfigurable computing system • Group Processor (GP) – a group of computing resources dedicated for the task and configured to reflect the task requirements.

  9. Group processor life- cycle 1. In the GP -links and functional units are configured before task processing 2. GP performs the task as long as it is necessary without interruption or time sharing with any other task 3. After task completion all resources included in the GP can be reconfigured for any other task.

  10. The concept of Reconfigurable • Group Organized computing system Data Stream Input / Output data bus I/O I/O I/O Functional Unit (FU) Functional Unit (FU) Functional Unit (FU) Reconfigurable Interface Module (RIM) Reconfigurable Interface Module (RIM) Reconfigurable Interface Module (RIM) Virtual Bus Configuration Bus Host PC

  11. Parallel processing of different tasks on the separated Group Processors Data out #2 Data out #3 Data in #2 GP1 for Task 1 GP 2 GP 3 Data out #1 I/O I/O I/O I/O Data in #1 FU 1 FU 2 FU 3 FU 4 Virtual Bus

  12. Concept of adaptation of the Group Processor architecture on the task • Architecture-to-task adaptation for the GP = • selection of resources configuration which: • satisfies all requirements for task processing • (e.g. performance, data throughput, reliability, etc.) • requires minimal hardware (I.e. logic gates) Data in Memory Memory Multiplier Adder Filter TIME T0 T1 T2

  13. Virtual Hardware Objects - the resource base of reconfigurable computing system • For FPGA-based systems all architecture components (resources) can be presented as Virtual Hardware Objects (VHOs) described in one of the hardware description languages (for example VHDL or AHDL) • Each resource can be presented in different variants – Ri,j, where i – indicates the type of resource (adder, multiplier, interface module, etc.) and j- indicates variant of resource presentation in the architecture (for example: 8-bit adder, 16-bit adder, etc.).

  14. Concept of Architecture Configuration Graph (ACG) Multiplier Adder Adder Adder Bus Bus Bus Bus Bus Bus 1 2 3 4 5 6 7 8 9 10 11 12

  15. Architecture Configurations Graph arrangement Architecture graph partial arrangement requires two procedures: 1. Local arrangement and 2. Hierarchic arrangement Local arrangement of variants for each type of system resources Adder 40 nS 20 nS Processing time

  16. Hierarchical arrangement of system resources Arrangement criteria - K(Ri ) = [ T max(Ri) - Tmin (Ri)] / (mi - 1) Multiplier Adder 80nS 20nS 40nS 20nS 40nS Adder Adder Adder Multiplier Multiplier 40nS 80nS 20nS 1 2 3 4 5 6 1 2 3 4 5 6 20nS 120 80 60 100 60 40 120 100 80 60 60 40 120 - 60 120 - 100 K(Mult)= ----------- =30 > K(Adder)= ------------ = 20 3 - 1 2 - 1

  17. Selection of Group Processor architecture based on the arranged ACG Required processing time for the task Y = A* X + B is T < 80 nS Multiplier 80nS 20nS 40nS Adder Adder Adder 40nS 20nS 1 2 3 4 5 6 120 100 80 60 60 40 GP-architecture = = Multiplier (#2) + Adder (#1) Required performance

  18. Number of experiments for GP-architecture selection N (GP opt )= ( n + 1 ) + log 2 (m 1 * m 2 *...m n ) n- number of resources (VHO) included in the architecture of the Group Processor mi - number of variants of each type of resources Example: If n= 16 and m1 = m 2 = … m n= 32 Total number of experiments (task run on estimated GP-architecture) N (GP opt) = 16 + 1 + 16 *5 = 97

  19. Self-adaptation mechanism for FPGA-based reconfigurable data-flow computing systems Host - PC Architecture generator Data Source Configuration Bus Library of Virtual Hardware Objects Reconfigurable platform Architecture Selector Performance Analyzer

  20. First prototype of Adaptive Reconfigurable Group Organized (ARGO) computing platform

  21. Data Flow Graph for DVB MPEG2 processing Input Data Streem - MPEG 2 Synchro-Signal Detect PCR - detection Null-packet analysis & removing Reference Frequency Output frequency adjustment PCR re-stamping Output MPEG 2 data stream

  22. Architecture selection time for 6-mode DVB MPEG 2 stream processor 1. Average time for each architecture configuration- 7.18 mS 2. Average time for GP-architecture selection (for the specific mode) - 175.6 mS 3.Total time for architecture selections for all modes-1.054 S

  23. Hardware implementation of DVB MPEG 2 stream processor for mode 1 and 4 Input Data -MPEG 2 stream FU #1 (8 bit In- port) FU # 1 Synchro-Signal Detect PCR - detection Null-packet analysis & removing Virtual bus (16 lines) FU # 2 Output frequency adjustment Reference Frequency PCR re-stamping FU #2 Out-port Output MPEG 2 data stream

  24. Hardware implementation of DVB MPEG stream processor for modes 2, 3, 5 and 6 Input Data -MPEG 2 stream FU #1 (8 bit In- port) FU # 1 Synchro-Signal Detect PCR - detection Null-packet analysis & removing Virtual bus (16 lines) FU # 2 Reference Frequency Output frequency adjustment FU # 3 PCR re-stamping FU # 3 Out-port Output MPEG 2 data stream

  25. Summary 1. Adaptive Reconfigurable Group Organized (ARGO) parallel computing system - FPGA-based configurable system with ability for adaptation on the task algorithm / data structure. 2. ARGO -system allows parallel processing of different data-flow tasks on the dynamically configured Group Processors (GPs), where each GP-architecture configuration corresponds to the algorithm / data specifics of the task assigned to this processor. 3. Above principles allows development of cost-effective parallel computing systems with programmable performance and reliability with minimum cost of hardware components and development time.

More Related