240 likes | 363 Vues
This overview delves into the fundamentals of processor architecture, addressing key components such as program registers, the program counter (PC), and memory management within the context of a Von Neumann architecture. It outlines the sequential execution of instructions, covering fetching, decoding, and executing arithmetic or logic operations. The role of registers in maintaining the visible state is emphasized, along with the importance of timing and synchronization of operations. Moreover, it highlights the intricacies of stack management and conditionals in instruction execution.
E N D
Computer Systems The processor architecture Computer Systems – the processor architecture
Basic Knowledge • Relative timing of the elements is important Computer Systems – the processor architecture
%eax %esi %ecx %edi %edx %esp %ebx %ebp Programmers visible state Program registers Memory CC PC Von Neumann architecture, both instructions and data in memory Computer Systems – the processor architecture
Memory invisible to user code 0xffffffff Kernel virtual memory 0xc0000000 User stack (created at runtime) Memory mapped region for shared libraries printf() function 0x40000000 Run-time heap (created at runtime by malloc) Read/write data Loaded from the hello executable file Read-only code and data 0x08048000 Unused 0 Program counter • The program counter holds the address of the instruction currently executed • The next instruction has to be collected from memory (slow!) or PC Computer Systems – the processor architecture
Processing a single instruction • Fetch • Read the instruction (1-5 bytes) from memory • Decode • Reads the values from the registers • Execute • Perform a arithmetic/logic operation OR Test the jump conditions • Memory • Read/Write to memory • Write back • Update the registers • PC update • Set the address of the next instruction Computer Systems – the processor architecture
A B Register file M E Seq. architecture PC Write back Data memory • Hardware connected with named wires(word & bytes, byte & bits, bit) Memory CC ALU Execute icode ifun rA rB valC valP Need valC PC increment Instr valid Need regids Decode Split Align Bytes 1-5 Byte 0 Instruction memory Instruction memory PC increment Fetch PC PC Computer Systems – the processor architecture
OPl rA, rB Fetch icode:ifun M1[PC] Read instruction byte rA:rB M1[PC+1] Read register byte valP PC+2 Compute next PC Decode valA R[rA] Read operand A valB R[rB] Read operand B Execute valE valB ifun valA Perform ALU operation Set CC Set condition code register Memory Write back R[rB] valE Write back result PC update PC valP Update PC Stage Computation: ALU Operation • Formulate instruction execution as sequence of simple steps • Use same general form for all instructions Computer Systems – the processor architecture
call Dest Fetch icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Compute return point Decode valB R[%esp] Read stack pointer Execute valE valB + –4 Decrement stack pointer Memory M4[valE] valP Write return value on stack Write back R[%esp] valE Update stack pointer PC update PC valC Set PC to destination Stage Computation: procedure call • Use ALU to decrement stack pointer • Store incremented PC Computer Systems – the processor architecture
jXX Dest Fetch icode:ifun M1[PC] Read instruction byte valC M4[PC+1] Read destination address valP PC+5 Fall through address Decode Execute Bch Cond(CC,ifun) Take branch? Memory Write back PC update PC Bch ? valC : valP Update PC Stage Computation: jump • Compute both addresses • Choose based on setting of condition codesand branch condition XX/ifun Computer Systems – the processor architecture
jmp 7 0 jle 7 1 jl 7 2 je 7 3 jne 7 4 jge 7 5 jg 7 6 Branch conditions JXX Computer Systems – the processor architecture
Execute Logic Datapaths & Control Logic • ALU fun: select function • ALU A: select Input A • ALU B: select Input B • Set CC: Should condition code register be loaded? Computer Systems – the processor architecture
OPl rA, rB Execute valE valB OP valA Perform ALU operation rmmovl rA, D(rB) Execute valE valB + valC Compute effective address popl rA Execute valE valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE valB + –4 Decrement stack pointer ret Execute valE valB + 4 Increment stack pointer Control logic: ALU A int aluA = [ icode in { IRRMOVL, IOPL } : valA; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC; icode in { ICALL, IPUSHL } : -4; icode in { IRET, IPOPL } : 4; # Other instructions don't need ALU ]; Computer Systems – the processor architecture
newPC New PC PC valM data out Data memory read Mem. control Memory write Addr Data Bch valE ALU fun. ALU CC Execute ALU A ALU B valA valB dstE dstM srcA srcB dstE dstM srcA srcB A B Register file M Decode E Write back icode ifun rA rB valC valP Instruction memory PC increment Fetch PC Hardware structure • This can be translated in silicon Computer Systems – the processor architecture
. . . . . . . . . . . . Sequential is too slow • Clock has to slow enough to let the signal propagate through all wires and transistors • Critical path: the slowest path between any two storage devices Clk Computer Systems – the processor architecture
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb. logic A R e g Comb. logic B R e g Comb. logic C R e g Clock Pipelining • Divide the operations in stages and allow to start the next operation if the first operation is ready with first stage • Increase the throughput, increase latency Computer Systems – the processor architecture
1 2 3 4 5 6 7 8 9 F D E M W F D E M W Cycle 5 D F W M E I4 I1 I2 I3 I5 F D E M W F D E M W F D E M W W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM W valM Data memory M_icode, M_Bch, M_valA Memory Addr, Data M Bch valE CC ALU Execute aluA, aluB E valA, valB d_srcA, d_srcB A B Register file M Decode E Write back D valP icode, ifun, rA, rB, valC valP Instruction memory PC increment Fetch predPC f_PC PC F Insert registers between stages • Pipeline registers means extra silicon and delay Computer Systems – the processor architecture
Data hazards Additional pipeline control is needed to prevent unintended interactions between instructions • Stalling (wait a few stages till hazard is gone) • Data forwarding (passing value to E before M/W) Pipeline architecture already used for i386http://www.pcmech.com/show/processors/35/ Computer Systems – the processor architecture
Pipeline efficiency Pipeline control can prevent many, but not all interactions between instructions → bubbles For the model described in the book: • Load / Use hazards (20% of load instr. → 1 bubble) • Mispredicted branches(40% of jmp instr. → 2 bubbles) • Return from procedure calls(100% of ret instr. → 3 bubbles) Computer Systems – the processor architecture
Today’s architectures • Superscalar (Pentium)(often two instructions/cycle) • Dynamic execution (P6)(three instructions out-of-order/cycle) • Explicit parallelism (Itanium)(six execution units) Computer Systems – the processor architecture
Hyper-Threading http://or1cedar.intel.com/media/training/detect_ht_dt_v1/tutorial/ch6/topic04.htm Computer Systems – the processor architecture
ISA Metrics of performance Answers per month Scaling of algorithms Application Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins Each metric has a place and a purpose, and each can be optimized Computer Systems – the processor architecture
Summary • Shown that an instruction set architecture can be translated onto multiple processor architectures • Complicated control logic on datapaths • Compilers have optimize the control logic for multiple machines/targets • A programmer can add/frustrate compiler Computer Systems – the processor architecture
80 ps 70 ps 30 ps 10 ps 60 ps 50 ps 20 ps A B C D E F R e g Assignment • Practice Problem 4.26 (page 430) Calculate the throughput and latency of a n-stage pipeline for the given 6 blocks Computer Systems – the processor architecture