100 likes | 240 Vues
The Philips TriMedia. A VLIW Architecture. By Jurjen Westra. TM-1 Block Diagram. SDRAM. Main Memory Interface. Image Coprocessor. Video In. VLD Coprocessor. Au d io In. Video Out. Audio Out. Timers. I2C Interface. Sync Serial Interface. VLIW CPU. 32K I$. 16K D$. PCI interface.
E N D
The Philips TriMedia A VLIW Architecture By Jurjen Westra
TM-1 Block Diagram SDRAM Main Memory Interface Image Coprocessor Video In VLD Coprocessor Audio In Video Out Audio Out Timers I2C Interface Sync Serial Interface VLIW CPU 32K I$ 16K D$ PCI interface TM has 128 general purpose 32 bit Registers
VLIW means relying on compiler techniques Only Cache-misses are run-time handled Compiler • Scheduling / Instruction Level Parallelism • Operation guarding • Speculation • Profiling for recompiling • Grafting (loop unrolling) • Alias analysis
Traditional Scheduling VLIW Scheduling A B C D B C A D A B C D C B A D
Instruction Cache Issue Slot 1 Issue Slot 2 Issue Slot 3 Issue Slot 4 Issue Slot 5 Execution Unit 1 Execution Unit 2 Execution Unit 27 But not all Issue Slots have access to all (types of) Execution Units!
Issue slot latency 1 2 3 4 5 CONST x x x x x ALU x x x x x SHIFTER x x FALU 3 x x DSPALU 2 x x DSPMUL 3 x x BRANCH 3 x x x IFMUL 3 x x FCOMP x DMEM 3 x x DMEMSPEC 3 x FTOUGH 17/16 x
Guarding C-code If(R2>R3) R4=R4+R5; Else R4=R4+R6; Assembly igtr R7 R2 R3 add R4 R4 R6 … … IF R7 add R4 R4 R5 … … … ...
Characteristics (1) • Custom Ops => loss of VLIW-character • Big or Little Endian • R0 and R1 have values 0 and 1 respectively • Geen Integer-Status-Flags but case-specific bit-patterns • 32 Interrupt-vectors • Interrupts are delayed
Characteristics (2) • 11 cycle read-miss-penalty • 3 cycle write-miss-penalty • Functional units require 1 cycle recovery time • Byte-addressable; 8-, 16- and 32-bit Loads and Stores • Register File supports up to 5 Writes per cycle (Latency) • Register File supports up to 15 Reads per cycle • Paging (64 bytes) • Instruction Length: 2-23 bytes; compressed
Example: MPEG-2 decoder • DVD-batman bitstream (4-9 Mbits/s) • 7 % Instruction-cache misses • 27% Data-cache misses • CPI (clock cycles/VLIW instruction): 1.37 • Total performance: 2,9 ops/clock