Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Instruction Set Issues PowerPoint Presentation
Download Presentation
Instruction Set Issues

Instruction Set Issues

122 Vues Download Presentation
Télécharger la présentation

Instruction Set Issues

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Instruction Set Issues • MIPS easy • Instructions are only committed at MEMWB transition • Other architectures are more difficult • Instructions may update state early • FP more difficult • Memory updating ops (e.g. string moves)

  2. Instruction Set Issues (cont.) • Difficult architectural features • “Odd” bits of state (e.g. condition codes) • May need saving/restoring on exceptions • Implicitly set condition codes • Complicate branch resolution • Explicit setting helps here (still a RAW hazard) • Multicycle operations • Widely differing execution times, lots of potential data hazards, etc.

  3. Instruction Set Issues • VAX suffers from many of these problems • Solution: pipeline the microcode • Intel 32-bit 80x86 processors since 1995 use a similar approach

  4. A.5. Handling Multicycle Operations • MIPS: FP operations • Long latency (EX repeated) • Several functional units • Structural hazards • Data hazards

  5. DLX: FP Design • Four functional units: • Integer ALU • as before • FP multiplier • also used for integer multiplication • FP adder • addition, subtraction and conversion • FP divider • also used for integer division

  6. MIPS Design with FP Units

  7. MIPS Multicycle Operations

  8. Hazards • Divides • Structural hazard • Multiple register writes possible in a cycle • Out-of-order completion • WAW hazards • Exception-handling complications • RAW hazards increase

  9. Potential RAW Hazards • Example (SPARC syntax): ldd [%fp-8], %f4 fmuld %f4, %f6, %f0 faddd %f0, %f8, %f2 std %f2, [%fp-16]

  10. Simpler: all stalls at one point Multiple Writes • Up to four instructions may need to write in the same cycle • Solution • Track writes in ID • Stall at instruction issue • Alternatively: • Stall at MEM or WB • Stall instruction with shorter latency (may free RAW hazards)

  11. WAW Hazards • Example: faddd %f4, %f6, %f2 … ! Integer op ldd [%fp-8], %f2

  12. WAW Hazards (cont.) • Rare • Compiler scheduling may result in unlikely instruction sequences, so must be caught • Solutions: • Stall issue of ldd • Prevent write by faddd

  13. Complete long before fdivd Maintaining Precise Exceptions • Out-of-order completion: fdivd %f2, %f4, %f0 faddd %f10, %f8, %f10 fsubd %f12, %f14, %f12 • Sub may cause an exception after add is complete, but not div • No longer precise

  14. Maintaining Precise Exceptions • It may be very difficult to handle exceptions precisely • E.g. the add has destroyed one of its operands! • Four solutions: • Accept imprecise exceptions • Needed for VM & IEEE FP • Allow switching between precise and imprecise modes

  15. Maintaining Precise Exceptions • Solutions (cont.) • Buffer results until earlier instructions complete • Buffers may grow very large, and extensive forwarding required • History files: restore original register values • Future files: store new register values • Software executes intervening instructions to get “up to date” before returning from exception

  16. Maintaining Precise Exceptions • Solutions (cont.) • Hybrid scheme • Instructions are only issued when it is certain that preceding instructions will not cause an exception • May require stalling the pipeline

  17. Performance of the MIPS FP Pipeline • Structural Hazards (divide unit) • Very low: 0-2 cycles per FP operation • RAW hazards • Divide: 12-24 cycles, average 14.2 • Add: 0.7-2.3 cycles, average 1.7 • In general, about 0.5 × latency

  18. Overall MIPS FP Performance • Stalls per instruction • 0.65-1.21 cycles • Average: 0.87 • 82% from FP RAW hazards

  19. A.6. Putting It All TogetherMIPS R4000 Pipeline • 64-bit instruction set • Eight stage pipeline • superpipelining • IF + IS: instruction fetch • RF: decode/register fetch • EX: execution • DF + DS + TC: data cache access • WB: write back

  20. MIPS R4000 Pipeline • Performance • Load delay: two cycles • Branch delay: three cycles • Delayed branch (one cycle) • Predict-not-taken strategy, with anulling • Increased forwarding requirements • Three stages between EX and WB now

  21. MIPS R4000 Pipeline • Floating Point • Three functional units • Divider, multiplier, adder • Shared components (8 sub-units) • Latency: 2–112 cycles • Initiation rate: 1–111 cycles • Complicated stall handling

  22. MIPS R4000 Pipeline • Performance: • CPI between 1.2 and 2.8 for SPEC92 benchmarks • Average: 2.0 • Integer: 1.54 • FP: 2.48 • Integer apps: mainly branch delays • FP apps: mainly FP data hazard stalls (RAW)