1 / 15

Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

Future Superscalar Processors Based on Instruction Compounding. Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith. Instruction Compounding (Fusing). Instruction compounding, or “fusing” has become a key idea in high performance microprocessors

luisa
Télécharger la présentation

Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Future Superscalar Processors Based on Instruction Compounding Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

  2. Instruction Compounding (Fusing) Instruction compounding, or “fusing” has become a key idea in high performance microprocessors “A compound instruction reflects the parallel issue of instructions; it comprises some number of independent instructions or interlocked instructions” “Instructions composing a compound instruction need not be consecutive.” -- S. Vassiliadis et al. IBM Journal of R and D, Jan. 1994 Future Microprocessors

  3. The Future Processor: Three Key Aspects • Instruction compounding or fusing • Based on S. Vassiliadis work • Employs compounding and 3-input ALU • Co-designed VM for dynamic translation/fusing • Concealed from all software • Optimized (fused) instructions held in code-cache • Dual decoder front-end for fast startup • Hardware front-end decoder for fast startup • Software translator for sustained high performance Future Microprocessors

  4. Processor Micro-architecture Future Microprocessors

  5. Fusible Instruction Set • RISC-ops with unique features: • A fusible bit per instruction fuses two dependent instructions • Dense instruction encoding, 16/32-bit ISA design • Special Features to Support the x86 ISA • Condition codes • Addressing modes • Aware of long immediate & displacement values Future Microprocessors

  6. Microarchitecture: Macro-op Execution • Enhanced OOO superscalar microarchitecture • Process & execute fused macro-ops as single Instructions throughout the entire pipeline Future Microprocessors

  7. Macro-op Fusing Algorithm • Objectives: • Maximize fused dependent pairs • Simple & Fast • Heuristics: • Pipelined Scheduler: Only single-cycle ALU ops can be a head. Minimize non-fused single-cycle ALU ops • Criticality: Fuse instructions that are “close” in the original sequence. ALU-ops criticality is easier to estimate. • Simplicity: 2 or fewer distinct register operands per fused pair • Solution: Two-pass Fusing Algorithm: • The 1st pass, forward scan, prioritizes ALU ops, i.e. for each ALU-op tail candidate, look backward in the scan for its head • The 2nd pass considers all kinds of RISC-ops as tail candidates Future Microprocessors

  8. Fusing Algorithm: Example x86 asm: ----------------------------------------------------------- 1. lea eax, DS:[edi + 01] 2. mov [DS:080b8658], eax 3. movzx ebx, SS:[ebp + ecx << 1] 4. and eax, 0000007f 5. mov edx, DS:[eax + esi << 0 + 0x7c] RISC-ops: ----------------------------------------------------- 1. ADD Reax, Redi, 1 2. ST Reax, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. AND Reax, 0000007f 5. ADD R17, Reax, Resi 6. LD Redx, mem[R17 + 0x7c] After fusing: Macro-ops ----------------------------------------------------- 1. ADD R18, Redi, 1 :: AND Reax, R18, 007f 2. ST R18, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ADD R17, Reax, Resi :: LD Rebx, mem[R17+0x7c] Future Microprocessors

  9. Instruction Fusing Profile • 55+% fused RISC-ops  increases effective ILP by 1.4 • Only 6% single-cycle ALU ops left un-fused. Future Microprocessors

  10. Other DBT Software Profile • Of all fused macro-ops: • 50%  ALU-ALU pairs. • 30%  fused condition test & conditional branch pairs. • Others  mostly ALU-MEM ops pairs. • Of all fused macro-ops: • 70+% are inter-x86instruction fusion. • 46% access two distinct source registers, • only 15% (6% of all instruction entities) write two distinct destination registers. • Translation Overhead Profile • About 1000 instructions per translated hotspot instruction. Future Microprocessors

  11. Co-designed x86 Processor Performance  Future Microprocessors

  12. Dual Decoder Front-End Future Microprocessors

  13. Evaluation: Startup Performance Future Microprocessors

  14. Activity of HW x86 Decoder Future Microprocessors

  15. Important Research Issues • Profiling • Probe insertion via software translator not feasible • Multi-core • Shared code cache • SMT designs • Memory consistency • Stores can be done in-order • Re-scheduled loads may be important for performance • Precise traps • Potential HW assist? Future Microprocessors

More Related