Introduction to Pipelining: Basic Concepts and Design Principles in Computer Architecture

Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank (New) Competency Area 6: Introduction to Pipelining

Basic Pipelining Concepts P&H 3rd ed., Chapter 6 H&P 3rd ed. §A.1

Pipelining - The Basic Concept • In early CPUs, deep combinational logic networks were used in between state updates. • Signal delays may vary widely across different paths. • New input cannot be provided to the network until the slowest paths have finished. • Slow clock speed, slow overall processing rates. • In pipelined design, deep logic networks are subdivided into relatively shallow slices (pipeline stages). • Delays through the network are made uniform. • A new input can be provided to each slice as soon as its quick, shallow network has finished. • Multiple inputs are processed simultaneously across stages. • Clock cycle is only as long as the slowest pipeline stage.

Generic Pipelining Illustration • Let represent any of a variety of logic gates • Initial, non-pipelined design for some random block of complex logic: latch latch

Pipelining Illustration cont. • Aggressively pipelined version of same logic: • Insert extra “pipeline registers” periodically • Here, after every 1-2 logic layers • This design can process 5x as much data at once! latch latch

Another View of Pipelining • Space-time diagrams: • Here, each colored area shows which parts of the logic network are occupied with data computed from a given input item, at which times. Depth in logic network Depth in logic network Data 1 Time Time Data 2 Pipelined (depth 6) Non-Pipelined

Simple Multicycle RISC Datapath IF ID EX MEM WB Next PC Loadfr. Mem.Data ProgramCounter Inst.Reg.

Basic RISC Execution Pipeline • Basic idea of instruction-execution pipelining: • Each instruction spends 1 clock cycle in each of the execution stages (in our example, there are 5). •  during 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions simultaneously! stage time

Different Visualizations Same Time,Different Places Same instruction, different steps Same Time,DifferentData Item /Instruction Same Time, Different Places Skew Same Place, Different Times Same Place, Different Times

More Graphical Detail

Adding Pipeline Registers

Description of Pipe Stages

Dependences (from H&P 3rd ed. §3.1)

Dependences • A dependence is a way in which one instruction can depend on (be impacted by) another for scheduling purposes. • Three major dependence types: • Data dependence • Name dependence • Control dependence • I’ll sometimes use the word dependency for a particular instance of one instruction depending on another. • The instructions can’t be effectively (as opposed to just syntactically) fully parallelized, or reordered.

Data Dependence • Recursive definition: • Instruction B is data dependent on instruction A iff: • B uses a data result produced by instruction A, or • There is another instruction C such that B is data dependent on C, and C is data dependent on A. • When a data dependence is present, there is a potential RAW hazard. Loop: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 SUBI R1,R1,#8 BNEZ R1,Loop A A B C B Direct data dependenciesin a simple examplecode fragment

Name Dependence • When two instructions access the same data storage location, but are not data dependent. • Also, at least one of the accesses must be a write. • Two sub-types (for inst. B after inst. A): • Antidependence: A reads, then B writes. • Potential for aWARhazard. • Output dependence: A writes, then B writes. • Potential for aWAWhazard. • Note: Name dependencies can be avoided by changing instructions to use different locations • (Rather than reusing 1 location for 2 purposes.) • This fix is called renaming. A time B A time B

Control Dependence • Occurs when the execution of an instruction (as in, will it be executed, or not?) depends on the outcome of some earlier, conditional branch instruction. • We generally can’t easily change which branches an instruction depends on w/o ruining the program’s functional behavior. • However, there are exceptions.

Hazards, Stalls, & Forwarding H&P 3rd ed. §A.2-3

Hazards • Hazards are circumstances which may lead to stalls in the pipeline if not addressed. • Stalls are delays, and may be called “bubbles” • There are three major types of hazards: • Structural hazards: • Not enough HW resources to keep all instrs. moving. • Data hazards • Data results of earlier instrs. not yet avail. when needed. • Control hazards • Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute.

Structural Hazard Example Suppose you had a combined instruction+data memory w. only 1 read port

Hazards Produce “Bubbles” Bubble rises Progress through pipe Time Unskew

Textual View A pipeline stalled for a structural hazard – a load with only one memory port

Example Data Hazards

Forwarding for Data Hazards

Another Forwarding Example

Three Types of Data Hazards • Let i be an earlier instruction, j a later one. • RAW (read after write) • j is supposed to Read a value After iWrites it, • But instead j tries to read the value before i has written it • WAW (write after write) • j should Write to a given place After iWrites there, • But they end up writing in the wrong order. • Only occurs if >1 pipeline stage can write. • WAR (write after read) • j should Write a new value After iReads the old, • But instead j writes the new value before i has read the old one. • Only occurs if writes can happen before reads in pipeline.

An Unavoidable Stall

Stalling in midst of instruction

Data Hazard Prevention • A clever compiler can often reschedule instructions to avoid a stall. • A simple example: • Original code:lw r2, 0(r4) add r1, r2, r3 Note: Stall happens here!lw r5, 4(r4) • Transformed code:lw r2, 0(r4) lw r5, 4(r4) add r1, r2, r3 No stall needed!

Simple RISC Pipeline Stall Statistics Note that ~1 in 5loads causes a stallin many programs! Percentageof loads thatcause a stall Benchmark

Data Hazard Detection

Hazard Detection Logic • Example: Detecting whether an instruction that has just been fetched needs to be stalled 1 cycle because of an immediately preceding load. IF/ID ID/EX EX/ME ME/WB IF ID EX ME WB IF/ID

Forwarding Situations in DLX

Implementing Forwarding in HW

Control Hazards, Branch Prediction, Delayed Branches H&P 3rd ed., §§A.2-3 & §4.2

Control Hazards • Suppose the new PC value was not computed until the MEM stage (like orig. RISC design). • Then we must stall 3 clocks after every branch!

Early Branch Resolution

New Pipeline Logic

Control Instruction Statistics • ~10% of dynamic insts.are fwd. cond. branches • only ~3% are backwardscond. branches • similar percentage areunconditional branches`

Stats on Taken Branches ~67% of cond.branches aretaken

Predict-Not-Taken

Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is taken(if taken) at this point

Filling the Branch-Delay Slot

Static Branch Prediction • Earlier we discussed predict-taken, predict-not-taken static prediction strategies • Applied uniformly across all branches in program • Static analysis in compiler may be able to do better, if it can non-uniformly predict whether each specific branch is likely to be taken or not • One way: Backwards taken, forwards not taken. • If we can do better, it can help with static code scheduling to reduce data hazard stalls… • Also may assist later dynamic prediction

Prediction Helps Static Scheduling LD R1,0(R2) DSUBU R1,R1,R3 BEQZ R1,else OR R4,R5,R6 DADDU R10,R4,E3 J after else: DADDU R7,R8,R9 … after: Some data dependences Codemovementsto consider: Potential load delay to fill Which way will thisbranch go? Ifcase If-then-elsecontrol flow Elsecase

Some Static Prediction Schemes • Always predict taken • 34% mispredict rate on SPEC (range 9%-54%) • Backwards predict taken, forwards not taken • In SPEC, more than ½ of forwards are taken! • This does worse than “always predict taken” strategy • Usu. not better than 30-40% misprediction rate • Better than either: Use profile information! • Collect statistics on earlier program runs. • Works well because individual branches tend to be strongly biased (taken or not) given average data • Bias tends to remain stable across multiple runs

Profile-Based Predictor Statistics Floating-Point

Predict-Taken vs. Profile-Based Instructions executed in between mispredictions Floating-point (Logscale!)

Introduction to Pipelining: Basic Concepts and Design Principles in Computer Architecture

Introduction to Pipelining: Basic Concepts and Design Principles in Computer Architecture

Presentation Transcript

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank