1 / 39

Superscalar Processor

Superscalar Processor. Abdullah A Alasmari & Eid S. Alharbi. Outlines. Pipeline & Hazards Superscalar Instruction issue policy Register renaming MIPS R10000 Advanced Superscalar Summary. Pipeline & Hazards.

makani
Télécharger la présentation

Superscalar Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Superscalar Processor Abdullah A Alasmari & Eid S. Alharbi Superscalar Processor

  2. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  3. Pipeline & Hazards • Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls • Ideal pipeline CPI: measure of the maximum performance attainable by the implementation • Structural hazards: HW cannot support this combination of instructions • Data hazards: Instruction depends on result of prior instruction still in the pipeline • Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps) Superscalar Processor

  4. Hazards • Structural Hazards: • Have as many functional units as needed • Data Hazards solutions: • Execute instructions in order. Use score-board to eliminate data hazards by stalling instructions • Execute instructions out or order, as soon as operands are available, but graduate them in order. • Use register renaming to avoid WAR and WAW data hazards • Control Hazards solutions: • Use branch prediction: • Make sure that the branch is resolved before registers are modified Superscalar Processor

  5. Branch prediction • What do we need to predict for a jump/branch? • jump: • the target address, which can be stored in the same instruction or computed from the current PC plus a displacement • Return from subroutine ret: • the return address, which is obtained from the stack (increasing the SP and reading from memory) • conditional branch: • the target address, which is usually computed from the current PC plus a displacement • Is the branch going to branch or continue with next instruction? Superscalar Processor

  6. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  7. Multiple Instruction Issue Multiple instructions issued each cycle • better performance • increase instruction throughput • decrease in CPI (below 1) • greater hardware complexity. • harder code scheduling job for the compiler Superscalar processors • instructions are scheduled by the hardware • different numbers of instructions may be issued simultaneously VLIW (“very long instruction word”) processors • instructions are scheduled by the compiler • a fixed number of operations are formatted as one big instruction Superscalar Processor

  8. What is Superscalar? • A machine designed to improve the performance execution of scalar instructions; where one instruction per cycle. • Superscalar architecture allows several instructions to be issued and completed per clock cycle • consists of a number of pipeline that are working in parallel • Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently in different pipelines • Executed in an order different from the program order • Equally applicable to RISC & CISC, In practice usually RISC Superscalar Processor

  9. Superscalar Execution IM Reg DM Reg ALU IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg Superscalar Processor

  10. Superscalar Execution IM Reg DM Reg ALU IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg IM Reg ALU DM Reg Superscalar Processor

  11. How Does it Work? Require: • instruction fetch • fetching of multiple instructions at once • dynamic branch prediction & fetching beyond conditional branches • instruction issue • methods for determining which instructions can be issued next • the ability to issue multiple instructions in parallel • instruction commit • methods for committing several instructions in fetch order duplicate & more complex hardware Superscalar Processor

  12. Superscalar Execution Example $f2 $f4 $f6 y + + v Critical Path = 9 cycles $f10 $f4 $f8 w * $f10 z + z x + $f10 v $f12 w x (inorder) y z • Assumptions • Single FP adder takes 2 cycles • Single FP multipler takes 5 cycles • Can issue add & multiply together • Must issue in-order Data Flow (Single adder, data dependence) (In order) v: addt $f10, $f2, $f4 w: mult $f10, $f10, $f6 x: addt $f12, $f10, $f8 y: addt $f4, $f4, $f6 z: addt $f10, $f4, $f8 Superscalar Processor

  13. Adding Advanced Features $f2 $f4 $f6 v y + + v w Critical Path = 9 cycles $f10 $f4 x $f8 w * y z $f10 z + z x + $f10 v $f12 w x y z • Out Of Order Issue • Can start y as soon as adder available • Must hold back z until $f10 not busy & adder available • With Register Renaming v: addt $f10, $f2, $f4 w: mult $f10, $f10, $f6 x: addt $f12, $f10, $f8 y: addt $f4, $f4, $f6 z: addt $f10, $f4, $f8 v: addt $f10a, $f2, $f4 w: mult $f10a, $f10a, $f6 x: addt $f12, $f10a, $f8 y: addt $f4, $f4, $f6 z: addt $f10, $f4, $f8 Superscalar Processor

  14. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  15. The Process of Instruction Issue IPreF IF EX IS1 IS2 • K-issue, dynamically scheduled superscalar processor IPreF: Prefetches instructions for superscalar IF: Conceptually, IF examines each instruction in the Issue Packet for hazards in program order IS1: Decides how many instruction from the packet can be issued simultaneously IS2: Examines the selected instructions in IS1 with already issued instructions for hazards Issue Packets: 0≤I ≤K Superscalar Processor

  16. Instruction Issue Policy • Instruction Issue Policy refers to the protocol used to issue instruction • The three types of ordering are • Order in which instructions are fetched • Order in which instructions are executed • Order in which instructions change registers and memory Superscalar Processor

  17. Instruction Issue Policy • The simplest policy is to execute and complete instruction in their sequential order • To improve parallelism, the processor has to look ahead and try to find independent instructions to execute in parallel • Thus, instructions will be executed in an order different from the strictly sequential one, with the restriction that the result must be correct • Execution policies: i. In-order issue with in-order completion ii. In-order issue with out-order completion iii. Out-of-order issue with out-of-order completion Superscalar Processor

  18. In-Order Issue with In-Order Completion • Instructions are issued in the exact order that would correspond to sequential execution [In-order Issue] and result are written in the same order [In-order Completion] Superscalar Processor

  19. In-Order Issue with Out-of-Order Completion • Result are written in different order • An output dependency exists if two instructions are writing into the same location • Output dependency R3:= R3 + R5; (I1) R4:= R3 + 1; (I2) R3:= R5 + 1; (I3) R7:= R3 + R4; (I4) • If I3 completes before I1, the result from I1 will be wrong. Superscalar Processor

  20. Out-of-Order Issue with Out-of-Order Completion • With in-order issue, no new instruction can be issued when processor has detected a conflict and is stalled, until after the conflict has been resolved • As such, the processor is not allowed to look ahead for further instructions, which could be executed in parallel • Out-of-order issue tries to resolve the above problem by taking a set of decoded instructions into an instruction window (buffer) • When a functional unit becomes available, an instruction from the window may be issued to the execute stage • Any instruction may be issued, provided that: i. it needs a particular functional unit that is available ii. no conflict or dependencies blocking this instruction Superscalar Processor

  21. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  22. Antidependency • Read-Write dependency DIV.D F0, F1, F2 (I1) ADD.D F3, F0, F4(I2) SUB.D F4, F5, F6 (I3) MUL.D F3, F5, F4 (I4) • I3 can not complete before I2 starts as I2 needs a value in F4 and I3 changes F4 • An antidependency exists if an instruction uses a location as an operand while a following one is writing into that location; • if the first one is still using the location when the second one writes into it, an error occurs: Superscalar Processor

  23. RegisterRenaming • Output dependencies and antidependencies can be treated similarly to true data dependencies as normal conflicts, by delaying the execution of a certain instruction until it can be executed • Parallelism could be improved by eliminating output dependencies and antidependencies, which are not real data dependencies • These artificial dependencies can be eliminated by automatically allocating new registers to values, when such dependencies has been detected • This technique is called register renaming Superscalar Processor

  24. Register renaming • DIV.D F0, F1, F2 DIV.D F0, F1, F2 • ADD.D F3, F0, F4 ADD.D F3, F0, F4 • SUB.D F4, F5, F6 SUB.D T, F5, F6 • MUL.D F3, F5, F4 MUL.D S, F5, T F0 0 Name Op1 Op2 Det F1 20 Div 20 5 F0 F2 5 Add Div 2.6 F3 F3 2.6 Sub 22.6 5.6 F4 F4 4.2 Mul 5.6 Sub F3 F5 22.6 F6 5.6 F31 Register File Reservation Station Superscalar Processor

  25. Execution Example Value Rename 10.0 $f2 $f2 20.0 $f4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 160.0 $f10 $f10 -- -- -- -- -- -- 320.0 $f12 $f12 ADD MULT Value Renames Valid -- -- F B1 -- -- F B2 Result Dest Result Dest -- -- F B3 -- -- -- -- -- -- F B4 • Assumptions • Two-way issue with renaming • Rename registers B1,B2, etc. • 1 cycle ADD.D latency, 2 cycles MUL.D v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 Superscalar Processor

  26. Execution Example Cycle 1 Value Rename 10.0 $f2 $f2 20.0 $f4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 160.0 B2 $f10 10.0 20.0 B1 B1 40.0 B2 320.0 $f12 $f12 ADD MULT Value Renames Valid -- $f10 F B1 -- $f10 F B2 Result Dest Result Dest -- -- F B3 -- -- -- -- -- -- F B4 • Actions • Instructions v & w issued • v target set to B1 • w target set to B2 v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 v w Superscalar Processor

  27. Execution Example Cycle 2 Value Rename 10.0 $f2 $f2 20.0 B4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 20.0 40.0 B4 -- -- -- 160.0 B2 $f10 B2 80.0 B3 30.0 40.0 B2 320.0 B3 $f12 ADD MULT Value Renames Valid 30.0 $f10 T B1 -- $f10 F B2 Result Dest Result Dest -- $f12 F B3 30.0 B1 -- -- -- $f4 F B4 • Actions • Instructions x & y issued • x & y targets set to B3and B4 • Instruction v executed v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 y x w v Superscalar Processor

  28. Execution Example Cycle 3 Value Rename 10.0 $f2 $f2 20.0 B4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 160.0 B2 $f10 B2 80.0 B3 -- -- -- 320.0 B3 $f12 ADD MULT Value Renames Valid 30.0 40.0 B2 w -- -- F B1 -- $f10 F B2 Result Dest Result Dest -- $f12 F B3 60.0 B4 -- -- 60.0 $f4 T B4 • Instruction v retired • But doesn’t change $f10 • Instruction w begins execution • Moves through 2 stage pipeline • Instruction y executed v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 x y Superscalar Processor

  29. Execution Example Cycle 4 Value Rename 10.0 $f2 $f2 20.0 B4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 160.0 B2 $f10 120.0 80.0 B3 -- -- -- 320.0 B3 $f12 ADD MULT Value Renames Valid -- -- F B1 120.0 $f10 T B2 Result Dest Result Dest -- $f12 F B3 -- -- 120.0 B2 60.0 $f4 T B4 • Instruction w finishes execution • Instruction y cannot be retired yet v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 x w Superscalar Processor

  30. Execution Example Cycle 5 Value Rename 10.0 $f2 $f2 20.0 B4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 120.0 $f10 $f10 -- -- -- -- -- -- 320.0 B3 $f12 ADD MULT Value Renames Valid -- -- F B1 -- -- F B2 Result Dest Result Dest 200.0 $f12 T B3 200.0 B3 -- -- 60.0 $f4 T B4 • Instruction w retired • update $f10 • Instruction y cannot be retired yet • Instruction x executed v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 x Superscalar Processor

  31. Execution Example Cycle 6 Value Rename 10.0 $f2 $f2 60.0 $f4 $f4 40.0 $f6 $f6 Op1 Op2 Dest Op1 Op2 Dest 80.0 $f8 $f8 -- -- -- -- -- -- 120.0 %f0 $f10 -- -- -- -- -- -- 200.0 $f12 $f12 ADD MULT Value Renames Valid -- -- F B1 -- -- F B2 Result Dest Result Dest 200.0 $f12 T B3 -- -- -- -- 60.0 $f4 T B4 • Instruction x & y retired • Update $f12 and $f4 v: ADD.D $f10, $f2, $f4 w: MUL.D $f10’ $f10, $f6 x: ADD.D $f12, $f10, $f8 y: ADD.D $f4, $f4, $f6 Superscalar Processor

  32. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  33. Example: MIPS R10000 • Can decode 4 instructions per cycle • Has 5 execution pipelines • Uses dynamic scheduling and out-of-order execution • Does speculative branching • Functional Units • Integer ALU1 • Integer ALU2 • Load/Store Unit • Float Adder • Float Multiply Superscalar Processor

  34. Example: MIPS R10000 7 Pipeline Stages Stage 1 Fetch Stage 2 Decode Stage 3 Issue Stage 4 Execute Stage 5 Execute Stage 6 Execute Stage 7 Store Issue FAdd-1 FAdd-2 FAdd-3 Result RF Issue FMpy-1 FMpy-2 FMpy-3 Result RF 5 Execution Pipelines Issue ALU1 Result RF Issue ALU2 Result RF Issue Add-Calc Result Data Cache RF Queues Instructions cache Decode Branch unit Branch Address (one branch can be handled every cycle) 4 instructions Fetch and Decode Functional Unit (Execute instructions) Superscalar Processor

  35. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  36. Advanced Superscalar • Future Architecture • Can issue 16 to 32 instructions • Consist of 24 to 48 functional units • Use advance branch prediction • Advantage • Enhancing performance • Disadvantage • Attempting to extract more instruction level parallelism has diminishing returns on performance as the issue width increases • Increasing Microprocessor complexity Superscalar Processor

  37. Outlines • Pipeline & Hazards • Superscalar • Instruction issue policy • Register renaming • MIPS R10000 • Advanced Superscalar • Summary Superscalar Processor

  38. Summary • Superscalar is ILP mechanism to enhance the performance by increasing throughput. • It is limited by • True data dependency • Procedural (Control) dependency • Resource conflicts • Output dependency • Antidependency Superscalar Processor

  39. Summary • Pros • The hardware solves everything: • Hardware detects potential parallelism between instructions; • Hardware tries to issue as many instructions as possible in parallel. • Hardware solves register renaming. • Cons • Very complex • Much hardware is needed for run-time detection. There is a limit in how far we can go with this technique. • Power consumption can be very large! • The window of executions limited Þ this limits the capacity to detect potentially parallel instructions Superscalar Processor

More Related