1 / 27

Design of Adaptive On-Chip Multiprocessor Systems

Design of Adaptive On-Chip Multiprocessor Systems. Prof. Dr. Christophe Bobda Self-Organizing Embedded Systems Group Department of Computer Science Kaiserslautern University of Technology. Agenda. Arbeitsgruppe SOES: Forschung Reconfigurable Computing

terah
Télécharger la présentation

Design of Adaptive On-Chip Multiprocessor Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of Adaptive On-Chip Multiprocessor Systems Prof. Dr. Christophe Bobda Self-Organizing Embedded Systems Group Department of Computer Science Kaiserslautern University of Technology

  2. Agenda • Arbeitsgruppe SOES: Forschung • Reconfigurable Computing • Selbstorganisation in Eingebetteten Systemen • Vorgehen zur Untersuchung von Selbstorganisation in ES • Testumgebung • Herausforderungen und Auswirkungen auf Rechensystemen • Adaptive Multiprocessor on Chip (AMoC) • Motivation • Entwurf • Erste Ergebnisse

  3. t4 t3 t2 t1 Arriving tasks Reconfigurable Hardware Task c5 ? c4 c3 c1 c2 Forschung - Reconfigurable Computing Entwurfsmethodik • Ali Ahmadinia Christophe Bobda Sándor P. Fekete Jürgen Teich, Jan C. van der VeenOptimal Free-Space Management and Routing-Conscious Dynamic Placement for Reconfigurable Computers , in IEEE Transaction on Computers, to appear • Christophe Bobda: CoreMap: A Rapid Prototyping Environment for Distributed Reconfigurable Systems. In International Journal of Embedded Systems, Special Issue on Hardware-Software Codesign for Systems-on-Chip, Issue 1/2, 2005 • A. Ahmadinia, C. Bobda, and J. Teich: On-line Placement for Dynamic Reconfigurable Devices. In International Journal of Embedded Systems, Inderscience, Issue 3/4, 2005 Kommunikation • Christophe Bobda, Ali Ahmadinia: Dynamic Interconnection of Reconfigurable Modules on Reconfigurable Devices. In IEEE Design & Test of Computers, Special Issue- Sep/Oct 05 - Special Issue on Networks on Chip • C. Bobda, Ahmadinia Ali, Rajesh Kurapati: DyNoC : A Communication Infrastructure for Dynamic Communication on Reconfigurable Devices. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL), Tempere, Finland, August 29 - September 1, 2005. • C. Bobda and Mateusz Majer and Dirk Koch and Ahmadinia Ali and Jürgen Teich: Packets Routing in Dynamically Changing Networks. In Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, Colorado, USA, April 4 - 5, 2005. Anwendungen • Christophe Bobda, Ali Ahmadinia, Mateusz Majer, Ding Ji, Jürgen Teich: Modular Video Streaming on a Reconfigurable Platform. In 13th IFIP international conferences, VLSI-SOC 2005 • Klaus Danne, Christophe Bobda: Dynamic Reconfiguration of Distributed Arithmetic Designs. In International Journal of Embedded Systems, Inderscience, Issue 5/6, 2005 • C. Bobda, K. Danne, and A. Linarth: Efficient Implementation of the Singular Value Decomposition on a Reconfigurable System. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL2003), pp. 1123-1126, Lisbon, Portugal, September 2003.

  4. Forschung - Reconfigurable Computing Plattformentwicklung • M. Majer, J. Teich, A. Ahmadinia, and C. Bobda: The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-Based Computer. Journal of VLSI Signal Processing Systems, Springer, 2006. • Christophe Bobda, Mateusz Majer, Ali Ahmadinia Thomas Haller, André Linarth, Jürgen Teich The Erlangen Slot Machine: Increasing Flexibility in FPGA-Based Reconfigurable Platforms IEEE 2005 Conference on Field-Programmable Technology (FPT' 05), Singapor, Singapor, Dezember 11-14 • Bobda, Christophe and Mateusz Majer and Thomas Haller and André Linarth: Increasing the Flexibility of FPGA-based Reconfigurable Systems. In 2005 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 17 - April 20, 2005, Napa, California

  5. Forschung - Selbstorganisation in ES Untersuchung und Einsatz der Self-*-Eigenschaften in ES • Selbst-konfigurierend • Selbst-reparierend • Selbst-optimierend • Selbst-schützend P1 R1 P3 R3 R2 P5 P22 R5 P2 P21 P6 P23 Adaptive Rechenknoten zur Unterstützung der Selbst-*-Mechanismen in ES • Hohe Flexibilität • Hohe Performance • Geringerer Energieverbrauch R6 P4 R4

  6. SO in ES - Experimentelles Vorgehen • Kooperative Überwachungssysteme • Verteilte Kameras • günstig • intelligent • adaptiv • Jede Kamera für einen Bereich • Kommunikation • Datenaustausch • Wissensbasiert • Informationen in Grenzbereichen • Wireless • Nicht relevante Daten herausfiltern • Steigert die Akzeptanz Problem: Nicht genügend Leistung Auf günstige Chip Lösung: Adaptive Multiprozessoreinsatz zur Leistungssteigerung

  7. Motivation – Von Neumann Computer Grenze • Instruction Level Parallelism (ILP) • Begrenzte Parallelität zwischen den Instruktionen • Komplexität der Schaltung wird immer teuerer • Strom • Exponentielles Wachstum • Sehr hohe Transistordichte • Hohe Schaltfrequenzen • Kühlen skaliert nicht exponentiell • 1980: kein Kühler • 1990: moderate Kühler • Heute: monströse Kühler • Nächste Generation: ?

  8. Motivation – HPC – Die Renaissance von FPGAs • FPGA (Field Programmable Gate Array) • Besitzt die Flexibilität von Prozessoren • Und die Leistung von ASICs • Verbesserungen seit 1991: • 200x mehr Kapazität • 40x schneller • 500x günstiger • 50x stromsparend • Wachsendes Interesse: z.B. Cray XD1 • Ziel: Einsatz von FPGAs als Adaptive Multiprocessor on Chip (AMoC) • Ausnützen der vorhandenen Prozessorkerne • Einbau von flexiblen Hardwarebeschleunigern • Momentan keine Werkzeuge verfügbar

  9. ExtMem 2 IO 2 ExtMem 1 Prozessorblock IO Arbiter 2 Mem 2 manager Mem 1 manager Proc Proc HW HW Mem Mem IO Arbiter 3 IO Arbiter 1 IO 3 IO 1 Interconnection Network Mem Mem Proc Proc HW HW Mem 4 manager Mem 3 manager IO Arbiter 4 ExtMem 4 ExtMem 3 IO 4 AMoC - Architektur

  10. AMoC – On-Chip Verbindung – BUS-basiert

  11. Proc HW Mem Mem ExtMem 3 AMoC – On-Chip Verbindung – DyNoC based

  12. 3x3 DyNoC FPGA-Implementierung

  13. NoC – Effizientere Realisierung - ClusteRing

  14. ClusteRing – Transceiver & Router

  15. ClusteRing – Datentransferprotokoll Client 0 Client 2 Client 1 Client 0: # of bytes Client 1: # of bytes Client 1: # of bytes Client 0: Status Code Client 1: Status Code Client 1: Status Code Client 2: # of bytes Client 2: Status Code Client n: # of bytes Client n: # of bytes Client n: # of bytes Client n: Status Code Client n: Status Code Client n: Status Code Received data Received data Received data

  16. Schritt 1 Schritt 2 Mapping Hardware Infrastruktur Anwendung

  17. Mapping – Schritt 1

  18. Mapping – Schritt 2

  19. Fallstudie • SVD: hardwarenah • 8x8 Matrix • 1 Prozessor: 149 us • 2 Prozessoren: 151 us • 4 Prozessoren: 160 us • 200x32 Matrix • 1 Prozessor: 59707 us • 2 Prozessoren: 36534 us • 31839 Berechnung (88 %) • 4694 Kommunikation (12 %) • 4 Prozessoren: 18150 us • 12960 us Berechnung (71%) • 5190 us Kommunikation (29 %)

  20. Fallstudie

  21. Vielen Dank für Ihre Aufmerksamkeit

  22. The Singular Value Decomposition (SVD) Pn P1 P2

  23. Computation of the SVD

  24. Mapping of virtual processors to physical ones Parallel implementation • Because the post multiplication of A(k) by Q(k) affects only the columns i and j, a parallel implementation is possible. • Pairwise column orthogonalization (Brent & Luk)

  25. Parallel Implementation • Block Orthogonalization:

  26. Fallstudie • Zielplattform: • Board: Xilinx ML310 • FPGA: XC2VP30 • Chip-Ressourcen • 6042 Slices (44%) • Takt: • PPC: 300MHz, • OPB: 100MHz, • NoC: 100MHz • Gemessene Werte: • send-recv delay: 10,8 μs • Durchsatz: 19,1 MB/s

More Related