1 / 30

Exam 1 Review

CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Exam 1 Review. Dr. Zhao Zhang Iowa State University. What We H ave Learned. Ch. 1: Computer Abstraction and Technology Technology Trends CPU Performance Instruction count, CPI, and cycle time

regina
Télécharger la présentation

Exam 1 Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Exam 1 Review Dr. Zhao Zhang Iowa State University

  2. What We Have Learned • Ch. 1: Computer Abstraction and Technology • Technology Trends • CPU Performance • Instruction count, CPI, and cycle time • Processor power efficiency • Processor manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 2

  3. Question Styles and Coverage • Short conceptual questions • Calculation questions • Performance improvement (speedup) • Power rate and energy saving • CPU time, CPI, Instruction Count, Cycle Time CPU time = # Cycles × CT = IC × CPI × CT Speedup = Old Time / New Time • The coverage excludes • Manufacturing and cost Chapter 1 — Computer Abstractions and Technology — 3

  4. Question 1 • A MIPS processor runs at 1.0GHz, and for a given benchmark program its CPI is 1.5. A design optimization will improve the clock rate to 1.5GHz and increase the CPI to 1.8. What is the speedup from the optimization? Instruction count remains the same Clock rate change: 1.5/1.0 = 1.5x Cycle time improvement factor is 1.50x CPI change: 1.8/1.5 = 1.2x Improvement factor is 0.83x (degradation) Overall performance improvement is 1.50*0.83 = 1.25x Chapter 1 — Computer Abstractions and Technology — 4

  5. Question 2 A processor spends 60% time on load/store instructions. A new design improve load/store performance by 2.0 times. What is the overall performance improvement? Amdahl’s Law: Speedup = 1/((1-f)+f/s) f: Fraction of time that the optimization applies to s: The improvement factor of the optimization Speedup = 1/(0.4 + 0.6/2.0) = 1/0.7 = 1.43 Chapter 1 — Computer Abstractions and Technology — 5

  6. What We Have Learned • Ch. 2, Instructions: Language of the Computer • Instruction set architecture • MIPS binary instruction format • Plus floating-point instructions Chapter 1 — Computer Abstractions and Technology — 6

  7. Question 3 Translate the following C statement into MIPS. Variables f, g, h are global and located at 100($gp), 104($gp) and 108($gp), respectively. extern int f, g, h;  f = g + 4 * h; Try to predict how many instructions that you have to use Chapter 1 — Computer Abstractions and Technology — 7

  8. Question 3 # Load g, load h, multiply, add, store lw $t0, 104($gp) # load g lw $t1, 108($gp) # load h sll $t1, $t1, 2 # 4*h add $t0, $t0, $t1 # g+4*h sw $t0, 100($gp) # store f Chapter 1 — Computer Abstractions and Technology — 8

  9. Exam Strategy In your exam, write comments with the MIPS code • It helps you write the code • It helps the grader understand your code • You may get more partial credit • In case your code is not 100% correct Chapter 1 — Computer Abstractions and Technology — 9

  10. Load and Store • Three factors: address, size and extension • Load/store word: lw, sw • Half word: lh, lhu, sh • Byte: lb, lbu, sb • Choose sign extension or zero extension, when loading a half word or a byte • Floating points load and store • Single precision: lwc1, swc1 • Double precision: ldc1, sdc1 Chapter 1 — Computer Abstractions and Technology — 10

  11. Array access • Load from an array element extern unsigned short X[]; h = X[i]; Assume h in $s2, X in $s0, i in $s1. sll $t0, $s1, 1 # $t0=i*2 add $t0, $s0, $t0 # $t0=&X[i] lhu $s2, 0($t0) # h=X[i] Chapter 1 — Computer Abstractions and Technology — 11

  12. Array Access • Store to an array element extern intY[]; Y[j] = g; Assume g in $s2, Y in $s0, j in $s1. sll $t0, $s1, 2 # $t0=j*4 add $t0, $s0, $t0 # $t0=&Y[j] sw $s2, 0($t0) # Y[j]=g Chapter 1 — Computer Abstractions and Technology — 12

  13. Array Access • Load and store floating point numbers extern double X[], Y[]; Y[i] = X[i]; Assume i in $s0, X in $a0, j in $a1 sll $t0, $s0, 3 # $t0=8*i add $t0, $a0, $t0 # $t0=&X[i] ldc1 $f0, 0($t0) # $f0:f1=X[i] add $t1, $a1, $t0 # $t1=&Y[i] sdc1 $f0, 0($t1) # $f0:f1=Y[i] Chapter 1 — Computer Abstractions and Technology — 13

  14. 16-bit and 32-bit Constants • Load a 16-bit immediate f = 0x1000; // f in $s0 addi $s0, 0x1000 • Load an 32-bit immediate f = 0xFFFF1000; lui $s0, 0xFFFF ori $s0, $s0, 0x1000 Chapter 1 — Computer Abstractions and Technology — 14

  15. Pointer Access • Pointer access int h, *p; Assume h in $t0, p in $s0. h = *p; lw $t0, 0($s0) # h = *p *p = h; sw $t0, 0($s0) # h = *p Chapter 1 — Computer Abstractions and Technology — 15

  16. Branches • Only two branches in the original MIPS beqrs, rt, label bners, rt, label • Branch if true/non-zero bners, $zero, label • Branch if false/zero beqrs, $zero, label Chapter 1 — Computer Abstractions and Technology — 16

  17. If-else Statement • Evaluate condition, branch if false if (a < 0) a = -a; Assume a in $s0 slt $t0, $s0, $zero # a < 0? beqendif # false? skip sub $s0, $zero, $s0 # a = -a endif: Chapter 1 — Computer Abstractions and Technology — 17

  18. If-else Structure • Evaluate condition, branch if false if (a > b) max = a; else max = b; Assume max in $s2, a in $s0, b in $s1 slt $t0, $s1, $s0 # b < a beq $t0, $zero, else # false? add $s2, $s0, $zero # max = a j endif else: add $s2, $s1, $zero # max = b endif: Chapter 1 — Computer Abstractions and Technology — 18

  19. FOR Loop Control and Data Flow Graph Linear Code Layout (Optional: prologue and epilogue) Init-expr Init-expr Jump For-body For-body Incr-expr Incr-expr Test cond Cond Branch if true T F

  20. Function with For-loop Translate the following C function into MIPS short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = checksum ^ X[i]; return checksum; } Chapter 1 — Computer Abstractions and Technology — 20

  21. Function with For-loop checksum: # X=>$a0, N=>$a1, i=>$t0, # checksum=>$v0 addi$v0, $zero, 0 # checksum = 0 addi $t0, $zero, 0 # i = 0 j loop_cond loop: sll $t1, $t0, 1 # i*2 add $t1, $a0, $t1 # &X[i] lh $t1, 0($t1) # load X[i] xor $v0, $v0, $t1 # checksum ^= X[i] addi $t0, $t0, 1 # i++ loop_cond: slt $t1, $t0, $a1 # i < N bne $t1, $zero, loop # loop jr $ra Chapter 1 — Computer Abstractions and Technology — 21

  22. Leaf and Non-Leaf Functions • Leaf function doesn’t call another function • Stack frame is not necessary • Prefer to use temp registers (t-registers) • Non-leaf function calls some other functions(s) • Must use a stack frame, has to save $ra • Usually has to use save registers (s-registers) Chapter 1 — Computer Abstractions and Technology — 22

  23. Non-Leaf Function What is the size of the frame? extern short xor(short, short); short checksum(short X[], int N) { int i; short checksum = 0; for (i = 0; i < N; i++) checksum = xor(checksum, X[i]); return checksum; } Chapter 1 — Computer Abstractions and Technology — 23

  24. Non-Leaf Function • X, N, i, and $ra must be preserved • Need a stack frame of 16 bytes addi $sp, $sp, -16 sw $ra, 12($sp) # for return address sw $s2, 8($sp) sw $s1, 4($sp) sw $s0, 0($sp) add $s0, $a0, $zero # $s0 = X add $s1, $a1, $zero # $s1 = N addi $s2, $zero, 0 # i = 0 Chapter 1 — Computer Abstractions and Technology — 24

  25. Non-Leaf Function … # function body lw $s0, 0($sp) lw $s1, 4($sp) lw $s2, 8($sp) lw $ra, 12($sp) addi $sp, $sp, 16 jr $ra Chapter 1 — Computer Abstractions and Technology — 25

  26. Register Name and Call Convention 6 24 6 Chapter 1 — Computer Abstractions and Technology — 26

  27. MIPS Call Convention: FP • The first two FP parameters in registers • 1st parameter in $f12 or $f12:$f13 • A double-precision parameter takes two registers • 2nd FP parameter in $f14or $f14:$f15 • Extra parameters in stack • $f0 stores single-precision FP return value • $f0:$f1 stores double-precision FP return value • $f0-$f19 are FP temporary registers • $f20-$f31 are FP saved temporary registers Chapter 1 — Computer Abstractions and Technology — 27

  28. FP Example: Call a Function extern double a, b, c; extern double max(double, double); c = max(a, b); ldc1 $f12, 100($gp) # $f12:$f13 = a ldc1 $f14, 108($gp) # $f14:$f15 = b jal max sdc1 $f0, 116($gp) # c = $f0:$f1 • Assume a, b, c assigned to 100($gp), 108($gp), and 116($gp) Chapter 1 — Computer Abstractions and Technology — 28

  29. FP Instructions in MIPS • Single-precision arithmetic • add.s, sub.s, mul.s, div.s • e.g., add.s $f0, $f1, $f6 • Double-precision arithmetic • add.d, sub.d, mul.d, div.d • e.g., mul.d $f4, $f4, $f6 Chapter 3 — Arithmetic for Computers — 29

  30. FP Instructions in MIPS • Single- and double-precision comparison • c.xx.s, c.xx.d (xx is eq, lt, le, …) • Sets or clears FP condition-code bit • e.g. c.lt.s $f3, $f4 • Branch on FP condition code true or false • bc1t, bc1f • e.g., bc1t TargetLabel Chapter 1 — Computer Abstractions and Technology — 30

More Related