100 likes | 221 Vues
This paper discusses a hybrid computing architecture leveraging heterogeneous processors on a single chip. It introduces a retargetable compiler tool designed for efficient partitioning and synthesis across various processing units, including CPUs, FPGAs, and ASICs. Key trade-offs in performance, power consumption, and flexibility are explored, with focus on optimizing resource usage within shared memory configurations. The study includes a detailed analysis of system specifications, performance modeling, and the challenges in implementing and evaluating such multi-processor architectures.
E N D
Hy-CA Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/30/2013
Hybrid Computing • Heterogeneous processors on single chip • “CPU” • FPGA • ASIC • N “CPU”s, M FPGAs, K ASICs • Tradeoffs of performance, power, flexibility
Generic Hybrid Architecture CPU 1 FPGA 1 FPGA 2 CPU 2 Shared Memory CPU m FPGA n Multi-CPU Multi-FPGA
Generic Hy-C Tools Source Code Objectives/Constraints System Specification Partitioning CPU Compiler FPGA Synthesis CPU Power-Performance Model FPGA Power-Performance Model Optimization Control
OMAP Resources (old) Veyron Tesla Shared Memory Ducati Multi-CPU
OMAP Processor Resources • Chiron • 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) • Power 600uW / MHz • Tesla • DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP • Power 200uW / MHz • Ducati • 200 MHz (targeted for control, low latency code) • Power 100uW / MHz
“Canonical” Resources StrongArm C64x Shared Memory WimpyArm FPGA
“Canonical” Processor Resources • StrongArm • 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) • Power 600uW / MHz • C64x • DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP • Power 200uW / MHz • WimpyArm • 200 MHz (targeted for control, low latency code) • Power 100uW / MHz • FPGA fabric
Hy-C for Canonical Chip Source Code Objectives/Constraints System Specification Partitioning C64x Wimpy Strong FPGA Optimization Control
Open Issue(s) • How should we describe the architecture? • How should we describe the optimization constraints? • How/when shall we implement this beast? • How will we evaluate the “performance” of the generated code?