480 likes | 621 Vues
This lecture introduces the fundamental concepts of pipelining in computer architecture, highlighting the transformation from deep combinational logic to pipelined designs that use shallow stages. It discusses how pipelining enables simultaneous instruction processing, uniform delays across stages, and improved throughput. The lecture also covers different types of data and control dependencies, potential hazards, and solutions for effective pipelining. This foundational knowledge is crucial for understanding contemporary CPU design and performance optimization strategies.
E N D
Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank (New) Competency Area 6: Introduction to Pipelining
Basic Pipelining Concepts P&H 3rd ed., Chapter 6 H&P 3rd ed. §A.1
Pipelining - The Basic Concept • In early CPUs, deep combinational logic networks were used in between state updates. • Signal delays may vary widely across different paths. • New input cannot be provided to the network until the slowest paths have finished. • Slow clock speed, slow overall processing rates. • In pipelined design, deep logic networks are subdivided into relatively shallow slices (pipeline stages). • Delays through the network are made uniform. • A new input can be provided to each slice as soon as its quick, shallow network has finished. • Multiple inputs are processed simultaneously across stages. • Clock cycle is only as long as the slowest pipeline stage.
Generic Pipelining Illustration • Let represent any of a variety of logic gates • Initial, non-pipelined design for some random block of complex logic: latch latch
Pipelining Illustration cont. • Aggressively pipelined version of same logic: • Insert extra “pipeline registers” periodically • Here, after every 1-2 logic layers • This design can process 5x as much data at once! latch latch
Another View of Pipelining • Space-time diagrams: • Here, each colored area shows which parts of the logic network are occupied with data computed from a given input item, at which times. Depth in logic network Depth in logic network Data 1 Time Time Data 2 Pipelined (depth 6) Non-Pipelined
Simple Multicycle RISC Datapath IF ID EX MEM WB Next PC Loadfr. Mem.Data ProgramCounter Inst.Reg.
Basic RISC Execution Pipeline • Basic idea of instruction-execution pipelining: • Each instruction spends 1 clock cycle in each of the execution stages (in our example, there are 5). • during 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions simultaneously! stage time
Different Visualizations Same Time,Different Places Same instruction, different steps Same Time,DifferentData Item /Instruction Same Time, Different Places Skew Same Place, Different Times Same Place, Different Times
Dependences (from H&P 3rd ed. §3.1)
Dependences • A dependence is a way in which one instruction can depend on (be impacted by) another for scheduling purposes. • Three major dependence types: • Data dependence • Name dependence • Control dependence • I’ll sometimes use the word dependency for a particular instance of one instruction depending on another. • The instructions can’t be effectively (as opposed to just syntactically) fully parallelized, or reordered.
Data Dependence • Recursive definition: • Instruction B is data dependent on instruction A iff: • B uses a data result produced by instruction A, or • There is another instruction C such that B is data dependent on C, and C is data dependent on A. • When a data dependence is present, there is a potential RAW hazard. Loop: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 SUBI R1,R1,#8 BNEZ R1,Loop A A B C B Direct data dependenciesin a simple examplecode fragment
Name Dependence • When two instructions access the same data storage location, but are not data dependent. • Also, at least one of the accesses must be a write. • Two sub-types (for inst. B after inst. A): • Antidependence: A reads, then B writes. • Potential for aWARhazard. • Output dependence: A writes, then B writes. • Potential for aWAWhazard. • Note: Name dependencies can be avoided by changing instructions to use different locations • (Rather than reusing 1 location for 2 purposes.) • This fix is called renaming. A time B A time B
Control Dependence • Occurs when the execution of an instruction (as in, will it be executed, or not?) depends on the outcome of some earlier, conditional branch instruction. • We generally can’t easily change which branches an instruction depends on w/o ruining the program’s functional behavior. • However, there are exceptions.
Hazards, Stalls, & Forwarding H&P 3rd ed. §A.2-3
Hazards • Hazards are circumstances which may lead to stalls in the pipeline if not addressed. • Stalls are delays, and may be called “bubbles” • There are three major types of hazards: • Structural hazards: • Not enough HW resources to keep all instrs. moving. • Data hazards • Data results of earlier instrs. not yet avail. when needed. • Control hazards • Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute.
Structural Hazard Example Suppose you had a combined instruction+data memory w. only 1 read port
Hazards Produce “Bubbles” Bubble rises Progress through pipe Time Unskew
Textual View A pipeline stalled for a structural hazard – a load with only one memory port
Three Types of Data Hazards • Let i be an earlier instruction, j a later one. • RAW (read after write) • j is supposed to Read a value After iWrites it, • But instead j tries to read the value before i has written it • WAW (write after write) • j should Write to a given place After iWrites there, • But they end up writing in the wrong order. • Only occurs if >1 pipeline stage can write. • WAR (write after read) • j should Write a new value After iReads the old, • But instead j writes the new value before i has read the old one. • Only occurs if writes can happen before reads in pipeline.
Data Hazard Prevention • A clever compiler can often reschedule instructions to avoid a stall. • A simple example: • Original code:lw r2, 0(r4) add r1, r2, r3 Note: Stall happens here!lw r5, 4(r4) • Transformed code:lw r2, 0(r4) lw r5, 4(r4) add r1, r2, r3 No stall needed!
Simple RISC Pipeline Stall Statistics Note that ~1 in 5loads causes a stallin many programs! Percentageof loads thatcause a stall Benchmark
Hazard Detection Logic • Example: Detecting whether an instruction that has just been fetched needs to be stalled 1 cycle because of an immediately preceding load. IF/ID ID/EX EX/ME ME/WB IF ID EX ME WB IF/ID
Control Hazards, Branch Prediction, Delayed Branches H&P 3rd ed., §§A.2-3 & §4.2
Control Hazards • Suppose the new PC value was not computed until the MEM stage (like orig. RISC design). • Then we must stall 3 clocks after every branch!
Control Instruction Statistics • ~10% of dynamic insts.are fwd. cond. branches • only ~3% are backwardscond. branches • similar percentage areunconditional branches`
Stats on Taken Branches ~67% of cond.branches aretaken
Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is taken(if taken) at this point
Static Branch Prediction • Earlier we discussed predict-taken, predict-not-taken static prediction strategies • Applied uniformly across all branches in program • Static analysis in compiler may be able to do better, if it can non-uniformly predict whether each specific branch is likely to be taken or not • One way: Backwards taken, forwards not taken. • If we can do better, it can help with static code scheduling to reduce data hazard stalls… • Also may assist later dynamic prediction
Prediction Helps Static Scheduling LD R1,0(R2) DSUBU R1,R1,R3 BEQZ R1,else OR R4,R5,R6 DADDU R10,R4,E3 J after else: DADDU R7,R8,R9 … after: Some data dependences Codemovementsto consider: Potential load delay to fill Which way will thisbranch go? Ifcase If-then-elsecontrol flow Elsecase
Some Static Prediction Schemes • Always predict taken • 34% mispredict rate on SPEC (range 9%-54%) • Backwards predict taken, forwards not taken • In SPEC, more than ½ of forwards are taken! • This does worse than “always predict taken” strategy • Usu. not better than 30-40% misprediction rate • Better than either: Use profile information! • Collect statistics on earlier program runs. • Works well because individual branches tend to be strongly biased (taken or not) given average data • Bias tends to remain stable across multiple runs
Profile-Based Predictor Statistics Floating-Point
Predict-Taken vs. Profile-Based Instructions executed in between mispredictions Floating-point (Logscale!)