770 likes | 904 Vues
This laboratory report by Vittorio Zaccaria explores the intricacies of DLX architecture, particularly focusing on pipelining and hazards. It provides a detailed analysis of the load/store architecture, emphasizing the importance of register speed relative to memory and compiler optimization capabilities. Key topics include structural, data, and control hazards, with solutions such as pipeline stalls and register forwarding. The report includes practical exercises on hazard resolution, performance metrics like CPI and MIPS, and hands-on simulations for comprehensive understanding.
E N D
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria
DLX • Load/Store Architecture • Registers are faster than memory • The compiler can do deeper optimization • 16bit offsets and immediates • 32bit integer registers • 64bit floating point registers • Fixed operation encoding: • Addr. Mode contained in the operation code • Fits in one word • Faster decoding Vittorio Zaccaria – Laboratory of Architectures
DLX (cont.) • 32 General purpose registers • 32 bit instructions: Vittorio Zaccaria – Laboratory of Architectures
DLX Pipeline Vittorio Zaccaria – Laboratory of Architectures
Pipeline Visualization Vittorio Zaccaria – Laboratory of Architectures
Hazards • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle • Structural hazards: HW cannot support this combination of instructions • Data hazards: Instruction depends on result of prior instruction still in the pipeline • Control hazards: Pipelining of branches & other instructions that change the PC • Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline Vittorio Zaccaria – Laboratory of Architectures
Structural Hazards Vittorio Zaccaria – Laboratory of Architectures
Data Hazards Vittorio Zaccaria – Laboratory of Architectures
Control Hazards Vittorio Zaccaria – Laboratory of Architectures
An example program: .data dati_a: .word 1,2,3,4,5,6,7,8 dati_b: .word 2,3,4,5,6,7,7,9 .text .global main add r3,r0,0 loop: lw r4,dati_a(r3) lw r5,dati_b(r3) sub r5,r5,r4 addi r3,r3,4 bnez r5,loop exit: Vittorio Zaccaria – Laboratory of Architectures
1st Exercise: • Draw pipeline chart • Indicate: • Data Hazards between WB stages and ID stages. • Control Hazards between EX stage and IF stage Vittorio Zaccaria – Laboratory of Architectures
2nd Exercise: Hazard Resolution • Software solution • NOPs insertion • Hardware solutions • Bubbles/stalls generation • Register forwarding • Software optimizations • Code rescheduling Vittorio Zaccaria – Laboratory of Architectures
NOP insertion add r3,r0,0 NOP NOP Loop: Lw r4,dati_a(r3) Lw r5,dati_b(r3) NOP NOP Sub r5,r5,r4 Add r3,r3,4 NOP Bnez r5,Loop NOP Vittorio Zaccaria – Laboratory of Architectures
NOP dynamic execution First loop: Second loop: ........ Loop composed by 5 instr and 4 Nops Vittorio Zaccaria – Laboratory of Architectures
Performance Indexes • CPI= average clock cycles per instruction; • Average Clock cycles= n° instr+n°stalls/nops+4 4 is the n° of cycles needed to execute the last instruction. • CPI=[Average Clock cycles]/[n° instr] Vittorio Zaccaria – Laboratory of Architectures
Performance evaluation of NOPs • Actual CPI= Instructions+Nops+4 13+4 --------------------------------- = -------- = 2.42 Instructions 7 • MIPS frequency[=200Mhz] ------------------------- = 82.35 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures
NOPs Manual Exercise • Execute manually the loop for two cycles (finishing on the nop after the 2nd bnez) and calculate CPI and MIPS • 10 minutes Vittorio Zaccaria – Laboratory of Architectures
Results • CPI= (21+4)/11=2.27 • MIPS= 88 Vittorio Zaccaria – Laboratory of Architectures
Asymptotic loop performance • Consider an intermediate cycle of the loop. • Count instructions + nops of the cycle and divide it by the number of effective instructions -> asymptotical CPI • 10 minutes Vittorio Zaccaria – Laboratory of Architectures
Performance evaluation of NOPs (asymptotic) • Asymptotic loop CPI= (Instructions+Nops)*n+4 9n+4 --------------------------------- = ---------- =~ 1.8 Instructions*n 5n • MIPS frequency[=200Mhz] ------------------------- = 111 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures
Bubbles • Bubbles are NOPs inserted by the hardware. • Branch instructions provoke the generation of a NOP • Next instructions are stalled • Previous instructions are executed. Vittorio Zaccaria – Laboratory of Architectures
Bubbles Example Vittorio Zaccaria – Laboratory of Architectures
Performance evaluation of bubbles • Actual CPI= Instructions+Bubbles/aborts+4 7+6+4 --------------------------------- = -----------= 2.42 Instructions 7 • MIPS frequency[=200Mhz] ------------------------- = 82.35 MIPS CPI*10^6 Vittorio Zaccaria – Laboratory of Architectures
Verify on the simulator • File-> load code ... -> pipe1.s -> select -> load -> yes • Configuration -> disable forwarding • Open clock cycle diagram • Execute -> single cycle (until 1st load of the 2nd cycle has been executed) Vittorio Zaccaria – Laboratory of Architectures
Result Vittorio Zaccaria – Laboratory of Architectures
Manual Exercise • Preview what happens in an intermediate cycle • Calculate asymptotical CPI and MIPS • 10 minutes Vittorio Zaccaria – Laboratory of Architectures
Let’s simulate it • Simulate the program until the 4th cycle Vittorio Zaccaria – Laboratory of Architectures
Solutions • After the 1st cycle, we note the same behavior: • 5 instructions • 1 nop • 3 stalls so the asymptotic values are: • Asymptotic values: • CPI=1.8 • MIPS=111.11 Vittorio Zaccaria – Laboratory of Architectures
Result Forwarding Vittorio Zaccaria – Laboratory of Architectures
Result Forwarding Vittorio Zaccaria – Laboratory of Architectures
Forwarding Example Vittorio Zaccaria – Laboratory of Architectures
Simulation of 2 cycles of the loop. • Configuration -> enable forwarding • Open clock cycle diagram • File -> Reset DLX • Execute -> single cycle • Just to the WB of the 2nd bnez Vittorio Zaccaria – Laboratory of Architectures
Simulation results Vittorio Zaccaria – Laboratory of Architectures
Manual Exercise • Calculate CPI and MIPS for the 2 cycles. • Calculate Asymptotical CPI and MIPS. • 15 minutes Vittorio Zaccaria – Laboratory of Architectures
Results • 2 cycles: • 11 instructions • 1 nop • 2 stalls • 4 cycles to flush the pipe • CPI=18/11=1.63 • MIPS=122 Vittorio Zaccaria – Laboratory of Architectures
Asymptotical Results • 5 instructions • 1 nop • 1 stall • CPI=[7n+4]/5n=1.4 • MIPS=142.86. Vittorio Zaccaria – Laboratory of Architectures
Speedup • Speed up of A w.r.t. B: Exec. Time B ------------- Exec. Time A Vittorio Zaccaria – Laboratory of Architectures
Calculate asymptotical speedup • Speedup(NOPs,Bubbles) • Speedup(Forwarding,NOPs) • Speedup(Forwarding,Bubbles) • 5 minutes Vittorio Zaccaria – Laboratory of Architectures
Calculate Asym. speedup • Speedup(NOPs,Bubbles)=1 • Speedup(Forwarding,NOPs)=1.29 • Speedup(Forwarding,Bubbles)=1.29 Vittorio Zaccaria – Laboratory of Architectures
Scheduling Optimizations • change of the order of operations to minimize stalls/bubbles (forwarding enabled): lw r3,0(r2) add r3,r3,r7 lw r4,0(r2) add r4,r4,r8 add r4,r4,r3 CPI=(5+2+4)/5 lw r3,0(r2) lw r4,0(r2) add r3,r3,r7 add r4,r4,r8 add r4,r4,r3 CPI=(5+4)/5 Vittorio Zaccaria – Laboratory of Architectures
1st Exercise addi r1,r0,1 seq r2,r1,r1 add r3,r3,r3 Loop: lw r4,0(r3) sub r3,r3,r4 bnez r1,Loop Vittorio Zaccaria – Laboratory of Architectures
Manual Exercises • Draw the conflicts between operations until the end of the 3rd execution of the cycle (last instruction bnez). No forwarding possible. • Insert bubbles/aborts in the right place to solve hazards. • Calculate CPI and throughput of the trace. • Calculate asymptotical CPI of the loop. • 20 minutes Vittorio Zaccaria – Laboratory of Architectures
Hazard Diagram Vittorio Zaccaria – Laboratory of Architectures
Bubbles/Stall insertion Vittorio Zaccaria – Laboratory of Architectures
CPIs • Trace CPI=[24+4]/12=~2.33 • Asymptotic CPI=[6n+4]/3n=~2 Vittorio Zaccaria – Laboratory of Architectures
Manual Exercises • Suppose now that forwarding is possible. • Draw the new execution pipeline diagram (until the execution of the 3rd bnez) and indicate when stalls must be generated by the hardware. • Calculate CPI and MIPS • Calculate asymptotical CPI and MIPS • 20 minutes Vittorio Zaccaria – Laboratory of Architectures
Pipeline Diagram Vittorio Zaccaria – Laboratory of Architectures
Results • CPI=21/12=1.75 • Asymptotical CPI=[(4+1)n+4]/3n=5/3=1.66 Vittorio Zaccaria – Laboratory of Architectures
2nd exercise loop: lw r2,dati_a(r4) lw r3,dati_b(r5) add r1,r2,r3 sw dati_a(r6),r1 addi r4,r4,4 addi r5,r5,4 addi r6,r6,4 j loop Vittorio Zaccaria – Laboratory of Architectures