1 / 8

Seven Minute Madness: Special-Purpose Parallel Architectures

Implement special-purpose architectures to exploit more parallelism than general-purpose microprocessors. Example architecture uses matrix multiplication as parallel computations. Limiting factors addressed with a system-level architecture for multiple FPGAs. Highlighting goals of high interconnect capacity and scalability. Recent publications and target applications in bioinformatics.

awilloughby
Télécharger la présentation

Seven Minute Madness: Special-Purpose Parallel Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seven Minute Madness:Special-Purpose Parallel Architectures Dr. Jason D. Bakos

  2. Field Programmable Gate Arrays • Implement special-purpose architectures for specific computations • Expoit more parallelism than general-purpose microprocessors • Software => hardware

  3. Example Special-Purpose Architecture • Example: • Matrix multiplication: (a x b) x (b x c) • a x c independentdot product computations (in parallel) • Each dot product: • b independent multiplications (in parallel) • log2b dependent levels of addition • An ideal special-purpose parallel architecture: • a x b x c multipliers and a x (b-1) x c adders • Would require only 1 + log2b “time units” (clock ticks) of latency • May be pipelined • Would seem “instantaneous” compared to a microprocessor

  4. Limiting Factors 1. I/O capacity: • Receive operands and transmit results from off-chip • Addressed by FPGA manufacturers • Multi-gigabit serial transceivers 2. Logic resources: • What happens if we need more (have more) FPGA logic? • Our work: • System-level architecture for multiple FPGAs to act as one • GOALS: • General-purpose • Scalable • Extremely high internal interconnect capacity (share resources transparently)

  5. north FPGA Logic Fabric FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA Application logic core east Lightweight Router west south Scalable Reconfigurable Distributed Fabric

  6. Scalable Reconfigurable Distributed Fabric • General-purpose point-to-point network (and network interfaces) • Linearly scalable topology (channels, switching logic) • High capacity interconnect • # of multi-hop paths increase exponentially • Req’s lightweight routing and load balancing • Best for applications with extremely high parallelism • Recent publications: • IEEE Sym. of Field Programmable Custom Comp. Machines (FCCM), Apr. 2006 • Field Programmable Logic and its Applications (FPL), Aug. 2006

  7. Application Work • Demo this architecture by accelerating applications in bioinformatics • Target applications: • Complex but highly parallel • Normally need HPC cluster • Collaboration with Dr. Tang: phylogenetic reconstruction • For these, primitive operations exhibit high degree of control dependency • Not traditional FPGA applications • Use FPGA array to implement enough primitive operations in parallel to gain massive speedup over single processor X

  8. FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA … … … … … … … … … … … … … … … … … … System Architecture • Goal: • Associate each multi-FPGA accelerator with each node in a small cluster • Achieve high efficiency SWITCH … processing node processing node processing node phylogeny-accelerator phylogeny-accelerator phylogeny-accelerator

More Related