1 / 65

The single cycle CPU

The single cycle CPU. Performance of Single-Cycle Machines. Memory Unit 2 ns ALU and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode ALU Memory Write Back Total R-format 2 1 2 0 1 6

Télécharger la présentation

The single cycle CPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The single cycle CPU

  2. Performance of Single-Cycle Machines • Memory Unit 2 ns • ALU and Adders 2 ns • Register file (Read or Write) 1 ns Class Fetch Decode ALU Memory Write Back Total R-format 2 1 2 0 1 6 LW 2 1 2 2 1 8 SW 2 1 2 2 7ns Branch 2 1 2 5ns Jump 2 2ns

  3. מה היה קורה עם cycleשל השעון היה באורך משתנה • נשווה לגבי תוכנית עם התערובת הבאה של פקודות: • Rtype: 44%, LW: 24%, SW: 12% • BRANCH: 18%, JUMP: 2% • I - מספר פקודות בתוכנית • T - אורך מחזור שעון • CPI - מספר מחזורים לפקודה = 1 Execution=I*T*CPI= 8*24%+7*12%+6*44%+5*18%+2*2%=6.3 ns

  4. התוצאה EXE Single cycle T single clock * I T single clock 8 EXE Variable T variable clock * I T variable clock 6.3 יחס של 1.27. היחס יהיה יותר גרוע כאשר נממש פקודות מסובכות כמו פעולות עם floating point הפתרון: אינו שעון בגודל משתנה - מסובך מבחינת הבניה. הפתרון: פקודה לוקחת מספר משתנה של cycles.

  5. Multicycle Approach הרעיון מאחורי שיטת ה- Multicycle: חיסכון בזמן: כל פקודה תקח את מספר היחידות השעון הנחוצות לה. חיסכון ברכיבים: שימוש באותו רכיב בשלבים שונים של הפקודה.

  6. שיטת הבניה של ארכיטקטורת ה- Multicycle חלק את הפקודה לשלבים. כל שלב cycle: - אזן את כמות העבודה הנדרשת בכל שלב. - הקטן את כמות העבודה הנדרשת בכל שלב - כל שלב יבצע רק פעולה אחת פונקצינאלית. בסיום כל מחזור שעון: - שמור את הערכים עבור השלבים הבאים. - הוסף לביצוע משימה זו רגיסטרים פנימיים נוספים.

  7. Timing of a lw instruction in a single cycle CPU PC 0x400000 I.Mem data Memory output Rs, Rt ALU inputs D.Mem adrs ALU output (address) D. Mem data Mem data We want to replace a long single CK cycle with 5 short ones: fetch execute memory decode Write back 2ns 1ns 2ns 2ns 1ns 0 1 2 3 4 5=(0) PC 0x400000 fetch Instruction in IR IR decode ALU calculates something A,B execute Timing of a lw instruction in a multi-cycle CPU ALUout Mem data memory MDR Write back

  8. ck ck Therefore we should add registers to the single cycle CPU shown below: 4 Adder Reg File [25:21]=Rs 5 Data Memory Instruction Memory PC ALU [20:16]=Rt 5 Address D. Out 5 Rd D.In [15:0] 16 Sext 16->32

  9. ck ck ck ck ck ck ck Adding registers to “split” the instruction to 5 stages: 4 Adder A Reg File ALUout MDR [25:21]=Rs 5 Data Memory Instruction Memory PC ALU IR [20:16]=Rt 5 Address D. Out 2 5 Rd D.In PCWrite B 4 3 0 1 5 [15:0] 16 Sext 16->32

  10. Here is the book’s version of the multi-cycle CPU: Only PC and IR have write enable signals All other registers hold data for a single cycle

  11. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck Here is our version of A mult--cycle CPU capable of R-type & lw/sw & branch instructions Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 << 2 <<2 IR

  12. Let us explain the multi-cycle CPU • First we’ll look at a CPU capable of performing only R-type instructions • Then, we’ll add the lw instruction • And the sw instruction • Then, the beq instruction • And finally, the j instruction

  13. PC 0x400000 0x400004 Memory output New Memory output Rs, Rt ALU inputs new ALU inputs ALU output New ALU output Write back fetch execute decode [15:11]=Rd [20:16]=Rt [25:21]=Rs 5 5 5 ck ck Let us remind ourselves how works a single cycle CPU capable of performingR-type instructions.Here you see the data-path and the timing of an R-typeinstruction. 4 Adder [31:26] 6 Reg File Instruction Memory PC ALU [5:0]=funct 6

  14. [25:21]=Rs [20:16]=Rt [15:11]=Rd 5 5 5 ck ck A single cycle CPU demo: R-type instruction 4 Instruction Memory Reg File ALU PC

  15. ck ck ck ck ck A multi cycle CPU capable of performing R-type instructions Reg File A IR[25:21]=Rs 5 Instruction & data Memory PC ALUout IR ALU IR[20:16]=Rt 5 5 Rd B

  16. ck ck ck ck ck A multi cycle CPU capable of R-type & instructionsfetch Reg File A IR[25:21]=Rs 5 Instruction & data Memory PC ALUout IR ALU IR[20:16]=Rt 5 5 Rd B 0 1

  17. ck ck ck ck ck A multi cycle CPU capable of R-type & instructionsdecode Reg File A IR[25:21]=Rs 5 Instruction & data Memory PC ALUout IR ALU IR[20:16]=Rt 5 5 Rd B 1 2

  18. ck ck ck ck ck A multi cycle CPU capable of R-type & instructionsexecute Reg File A IR[25:21]=Rs 5 Instruction & data Memory PC ALUout IR ALU IR[20:16]=Rt 5 5 Rd B 3 2

  19. ck ck ck ck ck ck A multi cycle CPU capable of R-type & instructionswrite back Reg File A IR[25:21]=Rs 5 Instruction & data Memory PC ALUout IR ALU IR[20:16]=Rt 5 Rd 5 Rd B 3 4

  20. PC 0x400000 Inst. Mem data Memory output = the instruction Timing of an R-type instruction in a single cycle CPU Rs, Rt ALU inputs ALU output (Data = result of cala.) GPR input fetch execute Write Back decode 0 1 2 3 4 (=0) PC Mem data Timing of an R-type instruction in a multi-cycle CPU fetch Previous inst. Current instruction IR decode A,B execute ALUout Write back

  21. fetch PC Mem data Current instruction IR=M ( PC ) Previous inst. Current instruction next inst. IR decode GPR outputs A= Rs, B= Rt A,B execute ALU output ALUuot= A op B Write back Rd = ALUout ALUout At the rising edge of CK: Rd=ALUout R-Type instruction takes 4 CKs IRWrite The state diagram: A= Rs, B= Rt ALUout = A op B IR=M(PC) Rd=ALUout

  22. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck A multi-cycle CPU capable of R-type instructions (PC calc. ) Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4

  23. fetch current PC next PC = current PC+4 PC Mem data IR Previous inst. current instruction next inst. decode GPR outputs A,B execute ALU output Write back ALUout At the rising edge of CK: Rd=ALUout ALUuot = A op B PC = PC+4 PCWrite

  24. ck ck ck ck ck A multi cycle CPU capable of R-type & instructionsfetch Reg File A IR[25:21]=Rs 5 Instruction Memory PC ALUout IR ALU ALU IR[20:16]=Rt 5 5 Rd B 4

  25. The state diagram of a CPU capable of R-type instructions only Fetch 0 Decode 1 R-type ALU 6 WBR 7 IR=M(PC) PC = PC+4 A=Rs B=Rt ALUout=A op B Rd = ALUout

  26. The state diagram of a CPU capable of R-type and lw instructions Fetch 0 Decode 1 lw R-type AdrCmp ALU 2 WB 6 lw Load 3 WBR 4 7 ALUout= A+sext(imm) MDR = M(ALUout) Rt = MDR

  27. ck ck ck ck ck ck ck We added registers to “split” the instruction to 5 stages.Let’s discuss the lw instruction 4 Adder A Reg File ALUout MDR [25:21]=Rs 5 Data Memory Instruction Memory PC ALU IR [20:16]=Rt 5 Address D. Out 2 5 Rd D.In PCWrite B 4 3 0 1 5 [15:0] 16 Sext 16->32

  28. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck First we draw a multi-cycle CPU capable of R-type & lw instructions: Reg File A Instruction Memory PC ALUout IR ALU ALU ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 Data Memory MDR We just moved the data memory All parts related to lw only are blue

  29. IR[25:21]=Rs IR[20:16]=Rt 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw instructionsfetch Reg File A Instruction Memory PC ALUout IR ALU ALU ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 Data Memory MDR

  30. ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw instructionsdecode Reg File A IR[25:21]=Rs 5 Instruction Memory PC ALUout IR ALU IR[20:16]=Rt 5 5 Rd B 4 IR[15:0] 16 Sext 16->32 << 2 Data Memory MDE

  31. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw instructionsAdrCmp Reg File A Instruction Memory PC ALUout IR ALU ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 Data Memory MDR

  32. IR[25:21]=Rs IR[20:16]=Rt 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw instructionsmemory Branch Address Reg File A InstructionMemory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 << 2 Data Memory MDR

  33. IR[25:21]=Rs IR[20:16]=Rt 5 5 ck ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw instructionsWB Reg File A InstructionMemory PC ALUout IR ALU Rt 5 Rd B 4 IR[15:0] 16 Sext 16->32 Data Memory MDR

  34. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck ck Can we unite the Instruction & Data memories? (They are not used simultaneously as in the single cycle CPU) Reg File A InstructionMemory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 Data Memory MDR

  35. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck So here is a multi-cycle CPU capable of R-type & lw instructionsusing a single memory for instructions & data Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 MDR

  36. PC 0x400000 I.Mem data Memory output Timing of a lw instruction in a single cycle CPU Rs, Rt ALU inputs D.Mem adrs ALU output (address) D. Mem data Mem data fetch execute memory decode Write back PC PC+4 fetch current instruction IR Previous inst. Timing of a lw instruction in a multi-cycle CPU decode A,B execute Data address ALUout Mem data memory Data to Rt MDR Write back

  37. fetch PC IR=M ( PC ) PC= PC+4 Mem data IR Previous inst. current instruction decode GPR outputs A= Rs, B= Rt A,B execute ALU output Data address ALUuot= A+sext(imm) ALUout Data address memory Mem data MDR=M(ALUout) Write back Data to Rt MDR At the rising edge of CK: Rt=MDR PCWrite, IRWrite

  38. The state diagram of a CPU capable of R-type and lw instructions WB IR=M(PC) PC = PC+4 Fetch 0 A=Rs B=Rt Decode 1 lw R-type ALUout= A+sext(imm) AdrCmp ALU ALUout=A op B 2 6 Load MDR = M(ALUout) 3 WBR Rt = MDR Rd = ALUout 4 7

  39. IR[25:21]=Rs IR[20:16]=Rt 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw & sw instructions Branch Address Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 << 2 MDR lw sw

  40. The state diagram of a CPU capable of R-type and lw and sw instructions Fetch 0 Decode 1 lw+sw R-type AdrCmp ALU 2 WB 6 lw sw Load Store 3 5 WBR 4 7 IR=M(PC) PC = PC+4 A=Rs B=Rt ALUout= A+sext(imm) ALUout=A op B M(ALUout)=B MDR = M(ALUout) Rd = ALUout Rt = MDR

  41. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw/sw & branch instructions Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 << 2 <<2 IR

  42. Adding the instruction beq to the state diagram: Fetch 0 Decode 1 lw+sw beq R-type AdrCmp Branch ALU 2 8 WB 6 lw sw zero Load not zero Store 3 5 WBR 4 7 Calc Rs -Rt (just to produce the zero signal) Calc PC=PC+sext(imm)<<2

  43. Fetch 0 Calc ALUout=PC+sext(imm)<<2 Decode 1 lw+sw beq R-type AdrCmp Branch ALU 2 8 WB 6 lw sw Load Store 3 5 WBR 4 7 Adding the instruction beq to the state diagram, a more efficient way: Let’s use the decode state in which the ALU is doing nothing to compute the branch address.We’ll have to store it for 1 more CK cycle, until we know whether to branch or not! (We store it in the ALUout reg.) Calc Rs - Rt. If zero, load the PC with ALUout data, else do not load the PC

  44. IR[25:21]=Rs IR[20:16]=Rt 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw/sw & branch instructions PC+4 Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 <<2 IR Branch Address

  45. Adding the instruction j to the state diagram: WB Fetch 0 Decode 1 j lw+sw beq R-type Jump AdrCmp Branch ALU 9 2 8 6 lw sw Load Store 3 5 WBR 4 7 PC = PC[31:28] || IR[25:0]<<2

  46. IR[20:16]=Rt IR[25:21]=Rs 5 5 ck ck ck ck ck ck A multi-cycle CPU capable of R-type & lw/sw & branch & jump instructions PC+4= next address IR[25:0] Jump address <<2+ PC[31:28] Reg File A Instruction & data Memory PC ALUout IR ALU 5 Rd B 4 IR[15:0] 16 Sext 16->32 <<2 IR Branch Address

  47. סיכום שלבי הפקודות השונות 0 1 6 2 8 9 3 7 4 5

  48. MultiCycle implementation with Control

  49. Final State Machine

  50. The final state diagram: WB Fetch 0 Decode 1 j lw+sw beq R-type Jump AdrCmp Branch ALU 9 2 8 6 lw sw Load Store 3 5 WBR 4 7

More Related