1 / 31

The Systolic Ring : A Scalable Dynamically Reconfigurable Core for Embedded Systems

The Systolic Ring : A Scalable Dynamically Reconfigurable Core for Embedded Systems. Pascal BENOIT , G. SASSATELLI, M. ROBERT, L. TORRES, G. CAMBON, T. GIL. controller USB2.0. PCI-X. PCI. MPEG2. Arbitrage. Introduction - SoC architectures. REUSE. Intellectual Properties (IP) Cores.

kisha
Télécharger la présentation

The Systolic Ring : A Scalable Dynamically Reconfigurable Core for Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Systolic Ring : A Scalable Dynamically Reconfigurable Core for Embedded Systems Pascal BENOIT, G. SASSATELLI, M. ROBERT, L. TORRES, G. CAMBON, T. GIL

  2. controller USB2.0 PCI-X PCI MPEG2 Arbitrage Introduction - SoC architectures REUSE Intellectual Properties (IP) Cores RECONFIGURATION RECONFIGURATION SOFTWARE HARDWARE Static / Dynamic CPU CPU DSP RAM1 Reconfigurable RAM2 RAM3 Reconfigurable Arbitration ROM Flash TEST (BIST) RECONFIGURATION TEST Reconfigurable Interconnect (Network on Chip...) Built In Selt Test

  3. controller USB2.0 PCI-X PCI MPEG2 Arbitrage Introduction Multimedia Data flow oriented applications RECONFIGURATION HARDWARE Static / Dynamic CPU CPU DSP RAM1 Reconfigurable RAM2 RAM3 Reconfigurable Arbitration ROM Flash TEST (BIST)

  4. Fine grain: Granularity: BIT adapted to Prototyping, Encryption High reconfiguration over-cost Low Functional frequencies Coarse Grain: Granularity: WORD adapted to DSP, data flow oriented processing Low reconfiguration over-cost High level of performances ALU, MULT MUXes Registers Introduction What kind of base block is suitable for Multimedia?

  5. Coarse Grain: Granularity: WORD adapted to DSP, data flow oriented processing Low reconfiguration over-cost High level of performances SYSTOLIC RING Coarse Grain Dynamically Reconfigurable Architecture Introduction What kind of base block is suitable for Multimedia? ALU, MULT MUXes Registers

  6. Outline The Systolic Ring : A Scalable Dynamically Reconfigurable Core for Embedded Systems • The Systolic Ring • Building Block • Operative Layer Topology • System Overview • Features • Application Example • 8*8 2D DCT • Structural Mapping • Performance Comparisons • Conclusion

  7. Operative Layer RAM Configuration Layer Configware Dnode Configuration Sequencer The Systolic Ring -System Overview Two-layers based reconfigurable architecture Coarse Grain Dynamically Reconfigurable Architecture

  8. Data processing oriented block ALU + Multiplier  MAC) Programmable component Local Sequencer Dynamic and autonomous configuration management one instruction per cycle The Systolic Ring - Building block DNODE (Data Node)

  9. Switch Dnode Dnode Layer organisation of Dnodes The Systolic Ring - Dnode Clusters Macro Node Direct data injection I/O

  10. Dnode Dnode Switch Dnode Dnode The Systolic Ring -Switch components Full connectivity between 2 layers Layer n-1 Layer n

  11. Switch Switch Dnode Dnode Switch Dnode Dnode Dnode Dnode Dnode Dnode Switch Switch Dnode Dnode Dnode Dnode Switch Switch Dnode Dnode Switch Dnode Dnode Customisable... The Systolic Ring -Operative Layer Topology I/O Ring Structure I/O I/O I/O I/O I/O I/O I/O

  12. Switch Switch Dnode Dnode Switch Dnode Dnode Dnode Dnode Dnode Dnode Switch Switch Dnode Dnode Dnode Dnode Switch Switch Dnode Dnode Switch Dnode Dnode The Systolic Ring -Operative Layer Topology Data Flows Forward Data Flow Unidirectional data transit between successive layers (circular pipeline) Forward Data Flow Reverse Data Flow Feedback pipeline network for recursive algorithms

  13. The Systolic Ring -Operative Layer Topology Data Flows Switch Switch Dnode Dnode Switch Forward Data Flow Dnode Dnode Unidirectional data transit between successive layers (circular pipeline Dnode Dnode Dnode Dnode Reverse Data Flow Switch Switch Reverse Data Flow Dnode Dnode Dnode Dnode Feedback pipeline network for recursive algorithms Switch Switch Dnode Dnode Switch Dnode Dnode

  14. The Systolic Ring -Operative Layer Topology Data Flows Switch Switch Dnode Dnode Switch Forward Data Flow Dnode Dnode Unidirectional data transit between successive layers (circular pipeline Dnode Dnode Dnode Dnode Reverse Data Flow Switch Switch Reverse Data Flow Dnode Dnode Dnode Dnode Feedback pipeline network for recursive algorithms Switch Switch Dnode Dnode Switch Dnode Dnode

  15. The Systolic Ring -Operative Layer Topology Data Flows Switch Switch Dnode Dnode Switch Forward Data Flow Dnode Dnode Unidirectional data transit between successive layers (circular pipeline Dnode Dnode Dnode Dnode Reverse Data Flow Switch Switch Reverse Data Flow Dnode Dnode Dnode Dnode Feedback pipeline network for recursive algorithms Switch Switch Dnode Dnode Switch Dnode Dnode

  16. Operative Layer Data RAM Configuration Layer Configware Configuration Sequencer Management Code The Systolic Ring -System Overview Not a stand-alone solution General Purpose Processor Coprocessor for data flow oriented applications

  17. The Systolic Ring Systolic Ring Features • RING-8 (8 Dnodes) • 0.18µ technology • 3.3 mm ² • 200 MHz • 1600 MIPS • 1600 MMACs / s Switch 1 Switch 2 N2,2 N1,1 N1,2 N2,1 BN1 BN3 BN2 BN4 Switch 3 Switch 4 N4,1 N3,2 N4,2 N3,1 Operative Layer Layout • Process geometry dropping  increase Dnode #

  18. Application example -8*82DDCT Even-Odd frequency decomposition

  19. Application example -8*82DDCT

  20. Application example -8*82DDCT Cycle 0 x0, x7, x0, x7 x0 x7 x0 x7

  21. Application example -8*82DDCT x1, x6, x1, x6 x1 x6 x1 x6 x0+x7 x0-x7

  22. Application example -8*82DDCT Cycle 1 x1, x6, x1, x6 x1 x6 x1 x6 x0+x7 x0-x7 1/8,  MAC MAC

  23. Application example -8*82DDCT Cycle 2 x2, x5, x2, x5 x2 x5 x2 x5 x1+x6 x1-x6 1/8,  MAC MAC

  24. Application example -8*82DDCT Cycle 3 x3, x4, x3, x4 x3 x4 x3 x4 x2+x5 x2-x5 1/8,  MAC MAC

  25. Application example -8*82DDCT Cycle 4 x3+x4 x3-x4 1/8,  MAC MAC

  26. Application example -8*82DDCT Cycle 5 Clear 2 transformed samples computed each 5 clock cycles Clear z0 z1 8*8 2D transformed samples each 320 clock cycles

  27. Application example -DCT 2D 8*8 64*64 image example DCT 2D 8*8 performed by 4 Dnodes RING-N implementation ( N Dnodes)

  28. Application example -DCT 2D 8*8 64*64 image example - Comparisons DCT Core Xilinx Pentium IV Intel TMS320C62 TI RING-16 RING-64 Cycles # 21248 10240 4171 5120 1280 1200 300 200 200 f (MHz) 80* 17.7 34.1 6.4 12.8 52.1 Proc. Time (µs) SSE2 Matrix *Device dependant Comment Even-Odd decomposition Coarse Grain Reconfigurable VLIW Type Fine Grain Reconfigurable Super scalar Only Processing time !!

  29. Application example -DCT 2D 8*8 64*64 image example - Comparisons µs 25000 40 52.1 35 20000 # cycles 30 µs Processing Time 25 15000 20 10000 15 10 5000 5 0 0 Pentium IV DCT Core C62 RING 16 RING 64

  30. Conclusion • DESIGN • Reconfigurable IP Core for SoC • Assembling Software • RING-8 prototype • FEATURES • Customisable IP Core • Good performance / area trade-off : Ring-8@200MHz (0.18µ) • 3.3 mm² • 1600 MIPS • Results for DCT, Wavelet Transform, Motion Estimation

  31. Thank You

More Related