Designing a Pipelined CPU

Read 4.7 for next class Designing a Pipelined CPU Peer Instruction Lecture Materials for Computer ArchitecturebyDr. Leo Porteris licensed under aCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Review -- Single Cycle CPU

Review -- Multiple Cycle CPU

Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Wr Review -- Instruction Latencies • Single-Cycle CPU Load • Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load Add

Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Instruction Latencies and Throughput • Single-Cycle CPU • Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Instruction Latencies and Throughput Which of the statements below is true about a pipelined processor?

Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Instruction Latencies and Throughput • Single-Cycle CPU • Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Instruction Latencies and Throughput • Single-Cycle CPU • Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Load Ifetch Reg/Dec Exec Mem Wr Instruction Latencies and Throughput • Single-Cycle CPU • Multiple Cycle CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • Pipelined CPU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Pipelining Advantages PI throughput Vs. latency • Higher maximum throughput • Higher utilization of CPU resources • But, more complicated datapath, more complex control(?)

Pipelining Throughput PI Matching Peek Throughput 1. Longest instruction 2. Average instruction 3. Cycle time

A Pipelined Datapath IF: Instruction fetch ID: Instruction decode and register fetch EX: Execution and effective address calculation MEM: Memory access WB: Write back

Idea for a Pipelined Datapath

IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU Execution in a Pipelined Datapath CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 IF ID EX MEM WB lw IF ID EX MEM WB lw lw lw lw

IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU Execution in a Pipelined Datapath CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Make sure to point Out Steady State CPI = 1 IF ID EX MEM WB lw IF ID EX MEM WB lw lw lw lw steady state

Pipeline Stages Should we force every instruction to go through all 5 stages? Can we break it up like we did for multi-cycle, with R-type taking 4 cycles instead of 5?

IM Reg DM Reg ALU IM Reg Reg ALU Mixed Instructions in the Pipeline CC1 CC2 CC3 CC4 CC5 CC6 lw add

IM Reg DM Reg ALU Pipeline Principles • All instructions that share a pipeline must have the same stages in the same order. • therefore, add does nothing during Mem stage • sw does nothing during WB stage • All intermediate values must be latched each cycle. • There is no functional block reuse IF ID EX MEM WB

Pipelined Datapath Instruction Fetch Instruction Decode/ Register Fetch Execute/ Address Calculation Memory Access Write Back registers!

The Pipeline in Execution add $10, $1, $2 Instruction Decode/ Register Fetch Execute/ Address Calculation Memory Access Write Back Draw active datapath through example

The Pipeline in Execution lw $12, 1000($4) add $10, $1, $2 Execute/ Address Calculation Memory Access Write Back

The Pipeline in Execution sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Memory Access Write Back

The Pipeline in Execution Instruction Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Back

The Pipeline in Execution Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2

The Pipeline in Execution Instruction Fetch Instruction Decode/ Register Fetch sub $15, $4, $1 lw $12, 1000($4) add $10, $1, $2 Write Register is wrong Did someone spot the error??

The Pipeline in Execution Instruction Fetch Instruction Decode/ Register Fetch Execute/ Address Calculation sub $15, $4, $1 lw $12, 1000($4)

The Pipeline, with controls But….

Pipeline Control Control Logic X. Combinational Logic Y. FSM or Microprogram

control instruction IF/ID ID/EX EX/MEM MEM/WB Pipelined Control

Pipelined Control Signals

The Pipeline with Control Logic

Is it really this simple?

IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU ALU CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 IM Reg DM sw $15, 100($2) • What just happened here which is problematic (BEST ANSWER)? • The register file is trying to read and write the same register • The ALU and data memory are both active in the same cycle • A value is used before it is produced • Both A and B • Both A and C Neg edge!

IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU ALU Data Hazards R2 Available • When a result is needed in the pipeline before it is available, a “data hazard” occurs. CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 sub $2, $1, $3 and $12, $2, $5 R2 Needed or $13, $6, $2 add $14, $2, $2 Use red and black lines! IM Reg DM sw $15, 100($2)

Hazards continued • What happens when... add $3, $10, $11 lw $8, 1000($3) sub $11, $8, $7 Draw dependencies

The Pipeline in Execution add $3, $10, $11 Execute/ Address Calculation Memory Access Write Back lw $8, 1000($3)

The Pipeline in Execution sub $11, $8, $7 lw $8, 1000($3) add $3, $10, $11 Memory Access Write Back

The Pipeline in Execution add $10, $1, $2 sub $11, $8, $7 lw $8, 1000($3) • add $3, $10, $11 Write Back

Pipelining Key Points • ET = IC * CPI * CT • We achieve high throughput without reducing instruction latency. • Pipelining exploits a special kind of parallelism (parallelism between functionality required in different cycles). • Pipelining uses combinational logic to generate (and registers to propagate) control signals. • Pipelining creates potential hazards.

Summary comparison (MIPS designs) CPI CT Single Cycle Multi-cycle Pipeline

Designing a Pipelined CPU

Designing a Pipelined CPU

Presentation Transcript

Pipelined ADC

A Pipelined Datapath

Designing a Pipelined CPU

inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 26 CPU Design: Designing a Single-cycle CPU

The Pipelined CPU

Pipelined Datapath

CS3410 Guest Lecture A Simple CPU: remaining branch instructions CPU Performance Pipelined CPU

Outline Introduction Version 1 EMY CPU : Pipelined EMY CPU

A Pipelined Processor

CPU Performance Pipelined CPU

Machine Structures Lecture 19 – CPU Design: Designing a Single-cycle CPU, pt 2

Pipelined Electronics

Pipelined protocols

Pipelined Implementation

Pipelining a CPU

Pipelined Architecture

A Pipelined Architecture

Pipelined Design

The Pipelined CPU

CA 2007 Project 1 – Pipelined CPU using Verilog –

Pipelined Pattern