240 likes | 383 Vues
This paper evaluates the Imagine Stream Processor, highlighting its ability to combine ASIC efficiency with the flexibility of programmable processors. The architecture is tailored for media applications, aiming to lower special-purpose processor design costs while enhancing performance. Key findings indicate that the processor sustains 16% to 60% peak arithmetic performance and provides significant energy efficiency advantages over traditional processors. While its programming model allows for locality exploitation, considerable effort is required for effective implementation. The conclusions present a compelling case for its applicability in media contexts.
E N D
Evaluating the Imagine Stream Processor Jung Ho Ahn, William J. Dally, BrucekKhailany, Ujval J. Kapasi, and Abhishek Das ISCA 2004
Motivation • Provide efficiency of an ASIC • Provide flexibility of a programmable processor • Simplify special-purpose processor design • Lower special-purpose processor design cost • Provide better applicability • Target media applications
Development Board PowerPC, 150 MHz 2 x Imagine, 200 MHz FPGA Bridge, 66 MHz 256MB of SDRAM / Imagine, 100 MHz
Execution on a Single Stream Kernel 1 SRF Iteration 1 … Input Stream … … Output Stream … Iteration n … … …
Execution of Multiple Kernels Kernel 1 SRF Stream 1 … processing… … … Stream 2 Kernel 2 … … Stream 3 processing… … … Kernel 3 Stream 4 … … processing… …
Application Performance GOPS: 18% GFLOPS: 60%
Energy Efficiency Energy consumption per FLOP : (when normalized to 0.13um 1.2V process) Imagine @ 200 MHz: 277pJ/FLOP TI C67x DSP @ 225MHz: 889pJ/FLOP (3.2x more) Intel Pentium M @ 1200GHz: 3600pJ/FLOP (13x more)
Compiler OptimizationsLoop Unrolling and Software Pipelining
Conclusions • Provides performance close to that of ASIC and flexibility via programming • Can sustain between 16% and 60% of the peak arithmetic performance • Exposed 2-level register file allows compiler to exploit locality • Broader applicability • Requires considerable programming effort • Limited to media applications with regular control-flow
Collab Questions • How does the performance compare to other processors? (Dan, Marko, Jason, Prateeksha, Chris) • What is the compiler efficiency? (Mario, Liang) • How were the design decisions motivated? (Jing, Marisabel) • How does the programming model compare to that of GPUs? (Greg)