CSC 3210 Computer Organization and Programming

CSC 3210Computer Organization and Programming Chapter 2 SPARC Architecture Dr. Anu Bourgeois 1

Introduction • SPARC is a load/store architecture • Registers used for all arithmetic and logical operations • 32 registers available at a time • Uses onlyloadand storeinstructions to access memory 2

Registers • Registers are accessed directly for rapid computation • 32 registers – divided into 4 sets -- Global: %g0-%g7 -- Out: %o0 - %o7 -- In: %i0 - %i7 -- Local: %l0 - %l7 • %g0 – always returns 0 • %o6, %o7, %i6, %i7 – do not use • Register size = 32 bits each 3

Table of Registers 4

SPARC Assembler • SPARC assembler as: 2-pass assembler • First pass: • Updates location counter without paying attention to undefined labels for operands • Defines label symbol to location counter • Second pass: • Values substituted in for labels • Ignores labels followed by colons 5

Assembly Language Programs • Programs are line based • Use mnemonics which generate machine code upon assembling • Statements may be labeled • Comments: ! or /* … */ /* instructions to add and to subtract the contents of %o0 and %o1 */ start: add %o0, %o1, %l0 !l0=o0+o1 sub %o0, %o1, %l1 !l1=o0-o1 6

Psuedo-ops • Statements that do not generate machine code • e.g. Data defininitions, statements to provide the assembler information • Generally start with a period a: .word 3 • Can be labeled .global main main: 7

Compiling Code – 2 step process • C compiler will call as and produce the object files • Object files are the machine code • Next calls the linker to combine .o files with library routines to produce the executable program – a.out 8

Compiling a C program %gcc -S program.c : produces the .s assembly language file %gcc expr.s –o expr : assembles the program and produces the executable file 9

Start of Execution • C compiler expects to start execution at an address main • The label must be at the first statement to execute and declared to be global .global main main: save %sp, -96, %sp • save instruction provides space to save registers 10

Macros • If we have macros defined, then the program should be a .m file • We can expand the macros to produce a .sfile by running m4 first % m4 expr.m > expr.s % gcc expr.s –o expr 11

SPARC Instructions • 3 operands: 2 source operands and 1 destination operand • Source registers are unchanged • Result stored in destination register • Constants : -4096 ≤ c < 4096 op regrs1, regrs2, regrd op regrs1, imm, regrd 12

Sample Instructions clr regrd • Clears a register to zero mov reg_or_imm, regrd • Copies content of source to destination add regrs1, reg_or_imm, regrd • Adds oper1 + oper2  destination sub regrs1, reg_or_imm, regrd • Subtracts oper1 - oper2  destination 13

Multiply and Divide • No instruction available in SPARC • Use functioncall instead • Must use %o0 and %o1 for sources and %o0 holds result mov b, %o0 mov b, %o0 mov c, %o1 mov c, %o1 call .mul call .div a = b * c a = b ÷ c 14

Instruction Cycle • Instruction cycle broken into 4 stages: Instruction fetchFetch & decode instruction, obtain any operands from register file, update PC. ExecuteExecute arithmetic instruction, compute branch target address, compute memory address for load or store instructions. Memory accessAccess memory for load or store instruction. Store resultsWrite instruction results back to register file. 15

Pipelining • SPARC is a RISC machine – want to complete one instruction per cycle • Overlap stages of different instructions to achieve parallel execution • Can obtain a speedup by a factor of 4 • Hardware does not have to run 4 times faster – break h/w into 4 parts to run concurrently 16

Laundry Analogy –wash / dry / iron Use only one m/c at a time • Wash 3 loads sequentially • Wash 3 loads in pipeline fashion W W D I D I W D I Use multiple m/c at a time W D I W D I W D I 17

Pipelining • Sequential: each h/w stage idle 75% of the time timeex = 4 * i • Parallel: each h/w stage working after filling the pipeline timeex = 3 + i Example: for 100 instr: 400 vs 103 cycles 18

Data Dependencies – Load Delay Problem load [%o0], %o1 add %o1, %o2, %o2 19

Branch Delay Problem • Branch target address not available until after execution of branch instruction • Insert branch delay slot instruction 20

Branch delays • Try to place an instruction after the branch that is useful – can also use nop • The instruction following a branch instruction will always be fetched • Updating the PC determines which instruction to fetch next 21

Actual SPARC Code: expr.m 22

Expanding Macros • After running through m4: %m4 expr.m > expr.s • Produce executable:%gcc expr.s – expr • Execute file: %./expr 23

The Debugger – gdb • Used to verify correctness, and find bugs • Can also execute a program, stop execution at any point and single-step execution • After assembling the program and placing the output into expr, launch gdb: %gdb expr • To run code in gdb, type “r”: (gdb) r 24

gdb Commands • Can be set at any address to stop execution in order to check status of program and registers • To set a breakpoint at a label: (gdb) b main Breakpoint 1 at 0x106a8 (gdb) • Typing “c” continues execution until it reaches the next breakpoint or end of code • Can print contents of a register (gdb) p $l1 $2 = -8 (gdb) • Best way to learn is by practice 25

Filling Delay Slots • The call instruction is called a delayed control transfer instruction :changes address from where future instructions will be fetched • The following instruction is called a delayed instruction, and is located in the delay slot • The delayed instruction is executed before the branch/call happens • By using a nop for the delay slot – still wasting a cycle • Instead, we may be able to move the instruction prior to the branch instruction into the delay slot. 26

Filling Delay Slots • Move sub instructions to the delay slots to eliminate nop instructions .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system 27

Filling Delay Slots • Executing the mov instruction, while fetching the sub instruction .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system EXECUTE  FETCH  28

Filling Delay Slots • Now executing the sub instruction, while fetching the call instruction .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system EXECUTE  FETCH  29

Filling Delay Slots • Now executing the call instruction, while fetching the sub instruction .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system • Execution of call will update the PC to fetch from mul routine, but since sub was already fetched, it will be executed before any instruction from the mul routine EXECUTE  FETCH  30

Filling Delay Slots • Now executing the sub instruction, while fetching from the mul routine .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …… .mul: save ….. …… EXECUTE  FETCH  31

Filling Delay Slots • Now executing the save instruction, while fetching the next instruction from the mul routine .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …… .mul: save ….. …… EXECUTE  FETCH  32

Filling Delay Slots • While executing the last instruction of the mul routine, will come back to main and fetch the call .div instruction .global main main: save %sp, -96, %sp mov 9, %l0 !initialize x sub %l0, 1, %o0 !(x - 1) into %o0 call .mul sub %l0, 7, %o1 !(x - 7) into %o1 call .div sub %l0, 11, %o1 !(x - 11) into %o1, the divisor mov %o0, %l1 !store it in y mov 1, %g1 !trap dispatch ta 0 !trap to system …… .mul: save ….. …… At this point %o0 has the result from the multiply routine – this is the first operand for the divide routine FETCH  The subtract instruction will compute the 2nd operand before starting execution of the divide routine 33 EXECUTE 

Branching Instructions • Delayed control transfer instructions • Branch target address not available until after execution of branch instruction • Can be conditional or unconditional • If conditional – test the condition codes (flags) • Operand of instruction is the target label bicc label 34

Testing • Able to test information about a previous result • State of execution saved in 4 integer codes or flags: • Z: whether the result was zero • N: whether the result was negative • V: whether execution of the result caused an overflow • C: whether execution of the result generated a carryout 35

Updating condition codes • Certain instructions can be modified to update the condition codes • Append “cc” to the instruction to save state of execution addcc regrs1, reg_or_imm, regrd subcc regrs1, reg_or_imm, regrd 36

Using Branches: • We can extend our program to compute y for an input range of 0 ≤ x ≤ 10 • First consider a C program and then we will translate to assembly • Do-while loop used main () { int x, y; x = 0; do { y = ((x-1)*(x-7))/(x-11); x++; } while (x < 11); } 38

define (x_r, l0) define (y_r, l1) .global main main: save %sp, -96, %sp clr %x_r !initialize x loop: sub %x_r, 1, %o0 !(x-1) call .mul sub %x_r, 7, %o1 !(x-7) call .div sub %x_r, 11, %o1 !(x-11) mov %o0, %y_r !store result add %x_r, 1, %x_r !x++ subcc %x_r, 11, %g0 !set condition codes bl loop nop mov 1, %g1 ta 0 no nop after the call – sub fills delay slot nop fills delay slot after the branch 39

define (x_r, l0) define (y_r, l1) .global main main: save %sp, -96, %sp clr %x_r !initialize x loop: sub %x_r, 1, %o0 !(x-1) call .mul sub %x_r, 7, %o1 !(x-7) call .div sub %x_r, 11, %o1 !(x-11) mov %o0, %y_r !store result add %x_r, 1, %x_r !x++ subcc %x_r, 11, %g0!set condition codes bl loop nop mov 1, %g1 ta 0 Can we use subbcc to fill the delay slot? 40

define (x_r, l0) define (y_r, l1) .global main main: save %sp, -96, %sp clr %x_r !initialize x loop: sub %x_r, 1, %o0 !(x-1) call .mul sub %x_r, 7, %o1 !(x-7) call .div sub %x_r, 11, %o1 !(x-11) mov %o0, %y_r !store result add %x_r, 1, %x_r!x++ subcc %x_r, 11, %g0 !set condition codes bl loop nop mov 1, %g1 ta 0 Can we use add to fill the delay slot? 41

define (x_r, l0) define (y_r, l1) .global main main: save %sp, -96, %sp clr %x_r !initialize x loop: sub %x_r, 1, %o0 !(x-1) call .mul sub %x_r, 7, %o1 !(x-7) call .div sub %x_r, 11, %o1 !(x-11) mov %o0, %y_r!store result add %x_r, 1, %x_r !x++ subcc %x_r, 11, %g0 !set condition codes bl loop nop mov 1, %g1 ta 0 Can we use mov to fill the delay slot? 42

define (x_r, l0) define (y_r, l1) .global main main: save %sp, -96, %sp clr %x_r !initialize x loop: sub %x_r, 1, %o0 !(x-1) call .mul sub %x_r, 7, %o1 !(x-7) call .div sub %x_r, 11, %o1 !(x-11) add %x_r, 1, %x_r !x++ subcc %x_r, 11, %g0 !set condition codes bl loop mov %o0, %y_r !store result mov 1, %g1 ta 0 Revised code with nops removed 43

Synthetic Instructions • Easier for us to read and understand logically • Assembler automatically substitutes synthetic instruction • Similar to the way macros work and are updated 44

Example Synthetic Instructions • cmp regrs1, reg_or_imm same as subcc regrs1, reg_or_imm, %g0 2)inc reg same as add reg, 1, reg 45

Update to previous code Original code without synthetic instructions Code with synthetic instructions add %x_r, 1, %x_r subcc %x_r, 11, %g0 bl loop mov %o0, %y_r inc %x_r cmp %x_r, 11 bl loop mov %o0, %y_r Note: .m program will include the synthetic instructions, but machine code will correspond to the substitution instructions on left hand side 46

Control Statements • When writing assembly programs, alwaysfirst write pseudo-code, then translate • Consider the following control structures first in C and then map to assembly: -- do while loop -- while loop -- for loop -- if-then / if-then-else 47

While – check condition first while ( a <= 17) { a = a + b; c++; } test: !target/label to branch back cmp %a_r, 17 !check a relative to 17, !and set condition codes bg done !reverse logic of test nop !delay slot add %a_r, %b_r, %a_r inc %c_r ba test !branch back to test nop !delay slot done: !whatever follows while loop Now see if we can optimize the code… 48

revision Optimizing While Loop currently • We test at start of loop to see if we should enter the body of the loop • Then we have a ba with nop at end of the loop • This brings up back to start to test again • We do have to test before entering the first time, but then we can just test at end to see if we should get back into the loop • This will take away one branch/nop in body of the loop 49

Revised While Loop test: !initial test cmp %a_r, 17 !check a relative to 17, !and set condition codes bg done !never enters loop if true nop !delay slot loop: add %a_r, %b_r, %a_r inc %c_r cmp %a_r, 17 !check condition again ble test !branch back into loop nop !delay slot done: notice it is blenot bg now 50

CSC 3210 Computer Organization and Programming