CDA 5155

Computer Architecture Week 1.5 CDA 5155

Start with the materials: Conductors and Insulators Conductor: a material that permits electrical current to flow easily. (low resistance to current flow) Lattice of atoms with free electrons Insulator: a material that is a poor conductor of electrical current (High resistance to current flow) Lattice of atoms with strongly held electrons Semi-conductor: a material that can act like a conductor or an insulator depending on conditions. (variable resistance to current flow)

Making a semiconductor using silicon e e e e e e e e e e e e e e e e e e e e What is a pure silicon lattice? A. Conductor B. Insulator C. Semi conductor

N-type Doping We can increase the conductivity by adding atoms of phosphorus or arsenic to the silicon lattice. They have more electrons (1 more) which is free to wander… This is called n-type doping since we add some free (negatively charged) electrons

Making a semiconductor using silicon This electron is easily moved from here e e e e e e e e e e e e e e e e e e e e e P What is a n-doped silicon lattice? A. Conductor B. Insulator C. Semi-conductor

P-type Doping Interestingly, we can also improve the conductivity by adding atoms of gallium or boron to the silicon lattice. They have fewer electrons (1 fewer) which creates a hole. Holes also conduct current by stealing electrons from their neighbor (thus moving the hole). This is called p-type doping since we have fewer (negatively charged) electrons in the bond holding the atoms together.

Making a semiconductor using silicon e e e e e e e e e e e e e e e e e e e This atom will accept an electron even though it is one too many since it fills the eighth electron position in this shell. Again this lets current flow since the electron must come from somewhere to fill this position. Ga ?

Using doped silicon to make a junction diode A junction diode allows current to flow in one direction and blocks it in the other. GND Vcc Electrons like to move to Vcc Electrons move from GND to fill holes.

Using doped silicon to make a junction diode A junction diode allows current to flow in one direction and blocks it in the other. Current flows e e e e e e e Vcc GND

Making a transistor Our first level of abstraction is the transistor. (basically 2 diodes sitting back-to-back) Gate P-type

Making a transistor Transistors are electronic switches connecting the source to the drain if the gate is “on”. Vcc Vcc Vcc http://www.intel.com/education/transworks/INDEX.HTM

Review of basic pipelining 5 stage “RISC” load-store architecture About as simple as things get Instruction fetch: get instruction from memory/cache Instruction decode: translate opcode into control signals and read regs Execute: perform ALU operation Memory: Access memory if load/store Writeback/retire: update register file

Pipelined implementation Break the execution of the instruction into cycles (5 in this case). Design a separate datapath stage for the execution performed during each cycle. Build pipeline registers to communicate between the stages.

Stage1: Fetch Design a datapath that can fetch an instruction from memory every cycle. Use PC to index memory to read instruction Increment the PC (assume no branches for now) Write everything needed to complete execution to the pipeline register (IF/ID) The next stage will read this pipeline register. Note that pipeline register must be edge triggered

M U X 1 PC + 1 + Instruction memory PC Instruction bits en en IF / ID Pipeline register Rest of pipelined datapath

Stage2: Decode Design a datapath that reads the IF/ID pipeline register, decodes instruction and reads register file (specified by regA and regB of instruction bits). Decode is easy, just pass on the opcode and let later stages figure out their own control signals for the instruction. Write everything needed to complete execution to the pipeline register (ID/EX) Pass on the offset field and both destination register specifiers (or simply pass on the whole instruction!). Including PC+1 even though decode didn’t use it.

PC + 1 regA PC + 1 Register File Contents Of regA regB Destreg Contents Of regB Data Instruction bits en Instruction bits IF / ID Pipeline register ID / EX Pipeline register Rest of pipelined datapath Stage 1: Fetch datapath

Stage 3: Execute Design a datapath that performs the proper ALU operation for the instruction specified and the values present in the ID/EX pipeline register. The inputs are the contents of regA and either the contents of regB or the offset field on the instruction. Also, calculate PC+1+offset in case this is a branch. Write everything needed to complete execution to the pipeline register (EX/Mem) ALU result, contents of regB and PC+1+offset Instruction bits for opcode and destReg specifiers Result from comparison of regA and regB contents

PC+1 +offset + Alu Result A L U Contents Of regA Rest of pipelined datapath Contents Of regB M U X contents of regB Instruction bits ID / EX Pipeline register EX/Mem Pipeline register PC + 1 Stage 2: Decode datapath Instruction bits

Stage 4: Memory Operation Design a datapath that performs the proper memory operation for the instruction specified and the values present in the EX/Mem pipeline register. ALU result contains address for ld and st instructions. Opcode bits control memory R/W and enable signals. Write everything needed to complete execution to the pipeline register (Mem/WB) ALU result and MemData Instruction bits for opcode and destReg specifiers

This goes back to the MUX before the PC in stage 1. MUX control for PC input Alu Result Alu Result Data Memory Rest of pipelined datapath Memory Read Data en R/W Instruction bits EX/Mem Pipeline register Mem/WB Pipeline register PC+1 +offset Stage 3: Execute datapath contents of regB Instruction bits

Stage 5: Write back Design a datapath that completes the execution of this instruction, writing to the register file if required. Write MemData to destReg for ld instruction Write ALU result to destReg for add or nand instructions. Opcode bits also control register write enable signal.

This goes back to data input of register file M U X bits 0-2 This goes back to the destination register specifier M U X bits 16-18 register write enable Alu Result Memory Read Data Stage 4: Memory datapath Instruction bits Mem/WB Pipeline register

+ + A L U M U X 1 Register file M U X PC Inst mem Data memory M U X Sign extend 0-2 M U X 16-18 IF/ ID ID/ EX EX/ Mem Mem/ WB

Sample Test Question (Easy) Which item does not need to be included in the Mem/WB pipeline register for the LC3101 pipelined implementation discussed in class? ALU result Memory read data PC+1+offset Destination register specifier Instruction opcode C. PC+1+offset

Sample Test Question (Hard?) What items need to be added to one of the pipeline registers (discussed in class) to support the <insert nasty instruction description here> ? IF/ID: PC ID/EX: PC+offset EX/Mem: Contents of regA EX/Mem: ALU2 result Mem/WB: Contents of regA

Things to think about… How would you modify the pipeline datapath if you wanted to double the clock frequency? Would it actually double? How do you determine the frequency?

Sample Code (Simple) Run the following code on pipelined LC3101: add 1 2 3 ; reg 3 = reg 1 + reg 2 nand 4 5 6 ; reg 6 = reg 4 & reg 5 lw 2 4 20 ; reg 4 = Mem[reg2+20] add 2 5 5 ; reg 5 = reg 2 + reg 5 sw 3 7 10 ; Mem[reg3+10] =reg 7

+ + A L U M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data memory instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 16-18 M U X Bits 22-24 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

+ + A L U M U X 1 0 0 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data memory noop 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Initial State Bits 0-2 0 0 0 Bits 16-18 M U X Bits 22-24 noop noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 0

+ + A L U add 1 2 3 M U X 1 0 1 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data memory add 1 2 3 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Fetch: add 1 2 3 Bits 0-2 0 0 0 Bits 16-18 M U X Bits 22-24 noop noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 1

+ + A L U nand 4 5 6 add 1 2 3 M U X 1 0 2 1 0 R0 0 36 R1 1 0 9 R2 Register file 2 36 M U X PC Inst mem Data memory nand 4 5 6 12 R3 0 0 18 R4 7 9 R5 41 R6 M U X data 22 R7 3 dest 0 Fetch: nand 4 5 6 Bits 0-2 3 0 0 Bits 16-18 M U X Bits 22-24 add noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 2

+ + A L U lw 2 4 20 nand 4 5 6 add 1 2 3 M U X 3 1 4 1 3 2 0 R0 0 36 R1 4 0 36 9 R2 Register file 5 18 M U X PC Inst mem Data memory lw 2 4 20 12 R3 45 0 18 R4 9 7 7 R5 41 R6 M U X data 22 R7 6 dest 9 Fetch: lw 2 4 20 Bits 0-2 3 6 3 0 Bits 16-18 M U X Bits 22-24 nand add noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 3

+ + A L U add 2 5 5 lw 2 4 20 nand 4 5 6 add 1 2 3 M U X 6 1 8 2 4 3 0 R0 0 36 R1 2 45 18 9 R2 Register file 4 9 M U X PC Inst mem Data memory add 2 5 8 12 R3 -3 0 18 R4 45 7 7 18 R5 41 R6 M U X data 22 R7 20 dest 7 Fetch: add 2 5 5 Bits 0-2 3 6 4 6 3 Bits 16-18 M U X Bits 22-24 lw nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 4

+ + A L U sw 3 7 10 add 2 5 5 lw 2 4 20 nand 4 5 6 add M U X 20 1 23 3 5 4 0 R0 0 45 36 R1 2 -3 9 9 R2 Register file 5 9 M U X PC Inst mem Data memory sw 3 7 10 45 R3 29 0 18 R4 -3 7 7 R5 41 R6 M U X data 22 R7 20 5 dest 18 Fetch: sw 3 7 10 Bits 0-2 6 3 4 5 4 6 Bits 16-18 M U X Bits 22-24 add lw nand IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 5

+ + A L U sw 3 7 10 add 2 5 5 lw 2 4 20 nand M U X 5 1 9 4 5 0 R0 0 -3 36 R1 3 29 9 9 R2 Register file 7 45 M U X PC Inst mem Data memory 45 R3 16 99 18 R4 29 7 7 22 R5 -3 R6 M U X data 22 R7 10 dest 7 No more instructions Bits 0-2 4 6 5 7 5 4 Bits 16-18 M U X Bits 22-24 sw add lw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 6

+ + A L U sw 3 7 10 add 2 5 5 lw M U X 10 1 15 5 0 R0 0 36 R1 16 45 9 R2 Register file M U X PC Inst mem Data memory 45 R3 99 55 0 99 R4 16 7 R5 -3 R6 M U X data 22 R7 10 dest 22 No more instructions Bits 0-2 5 4 7 7 5 Bits 16-18 M U X Bits 22-24 sw add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 7

+ + A L U sw 3 7 10 add M U X 1 0 R0 16 36 R1 55 9 R2 Register file M U X PC Inst mem Data memory 45 R3 0 99 22 R4 55 16 R5 -3 R6 M U X data 22 R7 dest 22 No more instructions Bits 0-2 5 7 Bits 16-18 M U X Bits 22-24 sw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 8

+ + A L U sw M U X 1 0 R0 36 R1 9 R2 Register file M U X PC Inst mem Data memory 45 R3 99 R4 16 R5 -3 R6 M U X data 22 R7 dest No more instructions Bits 0-2 Bits 16-18 M U X Bits 22-24 IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 9

Time graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback

What can go wrong? Data hazards: since register reads occur in stage 2 and register writes occur in stage 5 it is possible to read the wrong value if is about to be written. Control hazards: A branch instruction may change the PC, but not until stage 4. What do we fetch before that? Exceptions: How do you handle exceptions in a pipelined processor with 5 instructions in flight?

Data Hazards Data hazards What are they? How do you detect them? How do you deal with them?

Pipeline function for ADD Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate sum Memory: Pass results to next stage Writeback: write sum into register file

Data Hazards add 1 2 3 nand 3 4 5 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, nand will read the wrong value of R3

+ + A L U M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 16-18 M U X Bits 22-24 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

+ + A L U M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 R6 M U X data R7 offset dest valB dest dest dest op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

+ + A L U fwd fwd fwd M U X 1 target PC+1 PC+1 0 R0 eq? R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data memory instruction R3 ALU result mdata R4 M U X valB R5 data R6 M U X R7 offset valB op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

Three approaches to handling data hazards Avoid Make sure there are no hazards in the code Detect and Stall If hazards exist, stall the processor until they go away. Detect and Forward If hazards exist, fix up the pipeline to get the correct value (if possible)

Handling data hazards I: Avoid all hazards Assume the programmer (or the compiler) knows about the processor implementation. Make sure no hazards exist. Put noops between any dependent instructions. write R3 in cycle 5 add 1 2 3 noop noop nand 3 4 5 read R3in cycle 5

Problems with this solution Old programs (legacy code) may not run correctly on new implementations Longer pipelines need more noops Programs get larger as noops are included Especially a problem for machines that try to execute more than one instruction every cycle Intel EPIC: Often 25% - 40% of instructions are noops Program execution is slower CPI is 1, but some instructions are noops

CDA 5155

CDA 5155

Presentation Transcript

CDA and CDA Equivalencies

CDA 4

CDA 3100

CDA 5155

CDA 5155

CDA 5155

CDA 3100

CDA 3100

CDA 3100

CDA, EFDA

CDA 5155 and 4150

CDA 5155

CDA 3100

CDA 5155

CDA 3100

CDA and CDA Equivalencies