1 / 24

Computer Systems

Computer Systems. The processor architecture. Basic Knowledge. Relative timing of the elements is important. %eax. %esi. %ecx. %edi. %edx. %esp. %ebx. %ebp. Programmers visible state. Program registers. Memory. CC. PC.

lanai
Télécharger la présentation

Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Systems The processor architecture Computer Systems – the processor architecture

  2. Basic Knowledge • Relative timing of the elements is important Computer Systems – the processor architecture

  3. %eax %esi %ecx %edi %edx %esp %ebx %ebp Programmers visible state Program registers Memory CC PC Von Neumann architecture, both instructions and data in memory Computer Systems – the processor architecture

  4. Memory invisible to user code 0xffffffff Kernel virtual memory 0xc0000000 User stack (created at runtime) Memory mapped region for shared libraries printf() function 0x40000000 Run-time heap (created at runtime by malloc) Read/write data Loaded from the hello executable file Read-only code and data 0x08048000 Unused 0 Program counter • The program counter holds the address of the instruction currently executed • The next instruction has to be collected from memory (slow!) or PC Computer Systems – the processor architecture

  5. Processing a single instruction • Fetch • Read the instruction (1-5 bytes) from memory • Decode • Reads the values from the registers • Execute • Perform a arithmetic/logic operation OR Test the jump conditions • Memory • Read/Write to memory • Write back • Update the registers • PC update • Set the address of the next instruction Computer Systems – the processor architecture

  6. A B Register file M E Seq. architecture PC Write back Data memory • Hardware connected with named wires(word & bytes, byte & bits, bit) Memory CC ALU Execute icode ifun rA rB valC valP Need valC PC increment Instr valid Need regids Decode Split Align Bytes 1-5 Byte 0 Instruction memory Instruction memory PC increment Fetch PC PC Computer Systems – the processor architecture

  7. OPl rA, rB Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valP  PC+2 Compute next PC Decode valA  R[rA] Read operand A valB  R[rB] Read operand B Execute valE  valB ifun valA Perform ALU operation Set CC Set condition code register Memory Write back R[rB]  valE Write back result PC update PC  valP Update PC Stage Computation: ALU Operation • Formulate instruction execution as sequence of simple steps • Use same general form for all instructions Computer Systems – the processor architecture

  8. call Dest Fetch icode:ifun  M1[PC] Read instruction byte valC  M4[PC+1] Read destination address valP  PC+5 Compute return point Decode valB  R[%esp] Read stack pointer Execute valE  valB + –4 Decrement stack pointer Memory M4[valE]  valP Write return value on stack Write back R[%esp]  valE Update stack pointer PC update PC  valC Set PC to destination Stage Computation: procedure call • Use ALU to decrement stack pointer • Store incremented PC Computer Systems – the processor architecture

  9. jXX Dest Fetch icode:ifun  M1[PC] Read instruction byte valC  M4[PC+1] Read destination address valP  PC+5 Fall through address Decode Execute Bch  Cond(CC,ifun) Take branch? Memory Write back PC update PC  Bch ? valC : valP Update PC Stage Computation: jump • Compute both addresses • Choose based on setting of condition codesand branch condition XX/ifun Computer Systems – the processor architecture

  10. jmp 7 0 jle 7 1 jl 7 2 je 7 3 jne 7 4 jge 7 5 jg 7 6 Branch conditions JXX Computer Systems – the processor architecture

  11. Execute Logic Datapaths & Control Logic • ALU fun: select function • ALU A: select Input A • ALU B: select Input B • Set CC: Should condition code register be loaded? Computer Systems – the processor architecture

  12. OPl rA, rB Execute valE  valB OP valA Perform ALU operation rmmovl rA, D(rB) Execute valE  valB + valC Compute effective address popl rA Execute valE  valB + 4 Increment stack pointer jXX Dest Execute No operation call Dest Execute valE  valB + –4 Decrement stack pointer ret Execute valE  valB + 4 Increment stack pointer Control logic: ALU A int aluA = [ icode in { IRRMOVL, IOPL } : valA; icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC; icode in { ICALL, IPUSHL } : -4; icode in { IRET, IPOPL } : 4; # Other instructions don't need ALU ]; Computer Systems – the processor architecture

  13. newPC New PC PC valM data out Data memory read Mem. control Memory write Addr Data Bch valE ALU fun. ALU CC Execute ALU A ALU B valA valB dstE dstM srcA srcB dstE dstM srcA srcB A B Register file M Decode E Write back icode ifun rA rB valC valP Instruction memory PC increment Fetch PC Hardware structure • This can be translated in silicon Computer Systems – the processor architecture

  14. Computer Systems – the processor architecture

  15. . . . . . . . . . . . . Sequential is too slow • Clock has to slow enough to let the signal propagate through all wires and transistors • Critical path: the slowest path between any two storage devices Clk Computer Systems – the processor architecture

  16. 100 ps 20 ps 100 ps 20 ps 100 ps 20 ps Comb. logic A R e g Comb. logic B R e g Comb. logic C R e g Clock Pipelining • Divide the operations in stages and allow to start the next operation if the first operation is ready with first stage • Increase the throughput, increase latency Computer Systems – the processor architecture

  17. 1 2 3 4 5 6 7 8 9 F D E M W F D E M W Cycle 5 D F W M E I4 I1 I2 I3 I5 F D E M W F D E M W F D E M W W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM W valM Data memory M_icode, M_Bch, M_valA Memory Addr, Data M Bch valE CC ALU Execute aluA, aluB E valA, valB d_srcA, d_srcB A B Register file M Decode E Write back D valP icode, ifun, rA, rB, valC valP Instruction memory PC increment Fetch predPC f_PC PC F Insert registers between stages • Pipeline registers means extra silicon and delay Computer Systems – the processor architecture

  18. Data hazards Additional pipeline control is needed to prevent unintended interactions between instructions • Stalling (wait a few stages till hazard is gone) • Data forwarding (passing value to E before M/W) Pipeline architecture already used for i386http://www.pcmech.com/show/processors/35/ Computer Systems – the processor architecture

  19. Pipeline efficiency Pipeline control can prevent many, but not all interactions between instructions → bubbles For the model described in the book: • Load / Use hazards (20% of load instr. → 1 bubble) • Mispredicted branches(40% of jmp instr. → 2 bubbles) • Return from procedure calls(100% of ret instr. → 3 bubbles) Computer Systems – the processor architecture

  20. Today’s architectures • Superscalar (Pentium)(often two instructions/cycle) • Dynamic execution (P6)(three instructions out-of-order/cycle) • Explicit parallelism (Itanium)(six execution units) Computer Systems – the processor architecture

  21. Hyper-Threading http://or1cedar.intel.com/media/training/detect_ht_dt_v1/tutorial/ch6/topic04.htm Computer Systems – the processor architecture

  22. ISA Metrics of performance Answers per month Scaling of algorithms Application Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins Each metric has a place and a purpose, and each can be optimized Computer Systems – the processor architecture

  23. Summary • Shown that an instruction set architecture can be translated onto multiple processor architectures • Complicated control logic on datapaths • Compilers have optimize the control logic for multiple machines/targets • A programmer can add/frustrate compiler Computer Systems – the processor architecture

  24. 80 ps 70 ps 30 ps 10 ps 60 ps 50 ps 20 ps A B C D E F R e g Assignment • Practice Problem 4.26 (page 430) Calculate the throughput and latency of a n-stage pipeline for the given 6 blocks Computer Systems – the processor architecture

More Related