CS 3843 Computer Organization

CS 3843 Computer Organization Prof. QiTian Fall 2013 http://www.cs.utsa.edu/~qitian/CS3843/

Chapter 3 Machine-Level Representations of Programs • 11/11/2013 (Monday) • Section 3.7.5 Procedure • Quiz 4

Chapter 3 Machine-Level Representations of Programs • 11/08/2013 (Friday) • Tracking a recursive procedure Section 3.7.5 • Solution is posted under Resources. • 11/06/2013 (Wednesday) • Tracking a procedure Section 3.7.4 • Tracking a recursive procedure Section 3.7.5 • Reminder: Quiz on Friday Nov. 8 • 11/04/2013 (Monday) • Section 3.7.4 Procedure • 2nd Midterm Exam on Friday Nov. 15

Chapter 3 Machine-Level Representations of Programs • 11/01/2013 (Friday) • Loop slides 89-96 • Questions on Assignment 4 • 10/30/2013 (Wednesday) • Practice Problems on Conditional Flags • Reminder: Quiz on Friday Nov. 1st • 10/28/2013 (Monday) • Jump Instructions slides 76-87 • Assignment 4 is due Nov. 4.

Chapter 3 Machine-Level Representations of Programs • The week of 10/21-10/25 • Replacement Lectures by Prof. TurgayKorkmaz and TA • Slides 52-75

Chapter 3 Machine-Level Representations of Programs • 10/18/2013 (Friday) • Shift Operations • Examples 4-8 • Note: Conference Travel Oct. 21-25. • Replacement Lectures by Prof. TurgayKorkmaz and TA • 10/16/2013 (Wednesday) • Examples 2-3 • Arithmetic and Logical Operations • Practice Problems 4 and 5 • Slides 33-42 • 10/14/2013 (Monday) • Movement Instructions • Practice Problems 2 and 3, Example 1 • Slides 26-32

Chapter 3 Machine-Level Representations of Programs • 10/11/2013 (Friday) • Operand forms • Practice Problem 1 • 10/09/2013 (Wednesday) • An introduction to Assembly Code • Read Sections 3.1-3.3 • Slides 1-16

Example of Assembly Codes • Example 1 – sum.c int add(int x, int y) { int z; z=x+y; return z; } • What is its assembly code? gcc –O1 –S sum.c • Department Machines: elk01(~08).cs.utsa.edu

sum.s • Ignore the lines that start with . • The pushl and popl save and restore %ebp • In movl, the first argument is the source, and the second is the destination • addl adds the source and destination and stores the results in the destination • %eaxis used to hold the return value. • x and y are at 8(%ebp) and 12(%ebp) • Stack set-up and completion .file "sum.c" .text .globl add .type add, @function add: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax popl %ebp ret .size add, .-add .ident "GCC: (Ubuntu 4.3.3-5ubuntu4) 4.3.3" .section .note.GNU-stack,"",@progbits

Assembly Code • Highly machine specific • Why study it? • Being able to read and understand it is an important skill for serious programmers. • Shifted over the years from one of being able to write programs directly in assembly to one of being able to read and understand the code generated by compilers.

An Introduction to Assembly Language • IA32 – Intel Architecture 32-bits • The dominant machine language of most computers, and x86-64, its extension to run on 64-bit machine. • All the examples in this chapter are related mainly to 32-bits IA32 – it is our focus. • Computers execute machine code. • Sequences of bytes encoding low-level operations. • Assembly Code: a textual representation for the machine code giving the individual instructions in the program. • Complier, e.g., gcc C complier invokes an assembler and a linker to generate the executable machine code from the assembly code. • We take a close look at machine code and its human-readable representation as assembly code.

ATT versus Intel Assembly-code formats • In our representation, we use ATT-format • The default format for GCC, OBJDUMP, and the other tools • Other programming tools, including those from Microsoft as well as the documentations from Intel, use Intel-format. • gcc –O1 –S –masm=intelsum.c

In the simplest assembly language model, a computer consists of - • A main memory • An array of bytes • Consecutive numbered start at 0. • These numbers are called memory addresses. • A program counter or PC • Hold a memory address. • Called %eip in IA32. • A register file containing a small number of named locations. • Each location (register) can hold a fixed amount of information corresponding to the word size of the machines • Typical word size is 4 bytes (32-bits machine) • %eax, %edx, %ecx, %ebx, %esi, %edi, %esp, %ebp (8 registers) • Conditional code registers • Contain information about the last arithmetic or logical operation. • For example, ZF (zero flag) is set if the last operation resulted in 0. • For example, SF (sign flg) is set if the last operation yielded a negative value. • A set of floating-point registers for holding floating-point data

Section 3.1 History of Intel Processor Line • 1972: 8008 (3.5K) - first Intel microprocessor with 8-bit words. The instruction set was designed by Datapoint Corporation which was a leading maker of programmable CRT terminals.Datapoint was based in San Antonio, so you might say that the Intel architecture started just a few miles from here. • 1974: 8080 (4.5K) - first successful Intel microprocessor, had some 16-bit instructions. • 1978: 8086 (29K) - One of the first 16-bit microprocessors. 20-bit addresses with segmented address space. • 1979: 8088 (29K) - An 8086 with an 8-bit external bus - basis of the original IBM PC • 1980: 8087 (45K) - A floating point coprocessor for the 8086 and 8088, formed the bases for IEEE floating point standard. • 1982: 80286 (134K) - basis of the IBM PC-AT and MS Windows • 1985: 80386 (275K) (also called i386 – expanded the architecture to 32 bits) - added flat address space, could run Linux. • 1989: 80486 (1.2M) - integrated the floating point processor • 1993: Pentium (3.1M) - improved performance • 1995: PentiumPro (5.5M) - new processor design • 1997: Pentium 2 (7M) - more of the same • 1999: Pentium 3 (8.2M) - new floating point instructions • 2000: Pentium 4 (42M) - double precision floating point and many new instructions. • 2004: Pentium 4E (125M) - added hyperthreading • 2006: Core2 Duo (291M) - multiple cores, not hyperthreading • 2008: Core i7 Quad (781M) - multiple cores and hyperthreading • 2010: Itanium Tukwila (2B) - instruction-level parallelism • 2011: Xeon Westmere (2.6B) - 10 cores

Stack • Stack • Some region of memory • A data structure where values can be added or deleted, but only according to a “last-in, first-out” discipline • push: add data • pop: remove data

Consider the following int sum(int x, int y) { return x + y; } • Before the function is entered, a stack is set up with the stack pointer contained in a designated register (%esp). • The stack grows toward low memory. • The stack pointer points to the last item pushed on the stack. • The values of x and y are pushed on the stack. • The return address is also pushed on the stack. • Assume %esp is the stack pointer and all items are 4 bytes. • The return address is at 0(%esp) and the return value stored in %eax. • x is at 4(%esp). • y is at 8(%esp).

Machine code • cc –c sum.s • objdump –d sum.o which produces --------------------------------------------------------- sum.o: file format elf32-i386 Disassembly of section .text: 00000000 <add>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 03 45 08 add 0x8(%ebp),%eax 9: 5d pop %ebp a: c3 ret • To inspect the contents of machine-code files, a class of programs known as disasemblers can be invaluable. • objdump (for “object dump”) generates a format similar to assembly code from the machine code. • Each instruction takes up 1 to 15 bytes • Common instructions such as push, pop, or ret, are short

Machine Code • To use this program, we need a main to call it: • e1.c -------------------------------------------------------------------- int add(int x, int y); int main() { int x = 12; int y = 31; int z; z=add(x, y); printf("x is %d, y is %d, and z is %d\n", x, y, z); return 0; } • We do: cc –O1 –S e1.c to create: e1.s which is…

e1.s – Machine code main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ecx subl $20, %esp movl $31, 4(%esp) movl $12, (%esp) call add movl %eax, 12(%esp) movl $31, 8(%esp) movl $12, 4(%esp) movl $.LC0, (%esp) call printf movl $0, %eax addl $20, %esp popl %ecx popl %ebp leal -4(%ecx), %esp ret

Section 3.2 Program Encoding • gcc –O1 –o sum sum.c • The -O1 is a compiler directive telling it to limit the optimizations used. • The compiler generates assembly code: sum.s • The assembler converts the assembly code into object code: sum.o • The linker combines the object code with the libraries to produce an executable: sum • The sum.s file is not saved by default. • You can look at the assembly code generated using:gcc -O1 -Ssum.c • This produces a file sum.s in ATT format. • gcc –O1 –o sum sum.c • This produces a file sum.s in Intel format. • gcc –O1 –S –masm=intelsum.c

IA32 32-bit registers • Eight 32-bits registers %eax: accumulator %ecx: counter %edx: data %ebx: base %esi: source %edi: destination %esp: stack pointer%ebp: frame pointer

Section 3.4 Access Information • IA32 Registers • 8 8-bit registers • 8 16-bit registers • 8 32-bit registers • The first 6 32-bits registers can be considered general purpose registers, but historically they had specific uses. • You can modify the 8-bit registers without modifying the rest of the bits of the corresponding 32-bit register.

Why these strange names? • goes back to the 8080, an 8-bit machine with registers: A, B, C, D, etc. • The 8086 had 16-bit registers: ax, bx, cx, dx, where ax was made up of 2 8-bit registers, al and ah. • Similarly with bx, cx, and dx. • The 32-bit version (80386) extended these to 32 bits, making eax, ebx, etc. • The low 16 bits of eax are just ax, and ax is made up of ah and al. • The 64-bit architecture has 128 64-bit registers called r0 - r127.

Section 3.3 Data Formats for IA 32 • b Byte: 8 bits (of course) • used for char • w Word: 16 bits (for compatability with 16-bit architecture) • used for short • l Double Word: 32 bits • used for int, long, and pointers • s Single Precision: 32 bits • used for float • l Double Precision: 64 bits • used for double • t Extended Precision: 80 or 96 bits • used for long double • No direct support for long long (64-bit ints).Operations must be done in pieces.

Section 3.4.1 Operand Specifiers • There are 11 basic forms for operands. • 1 for immediate (constant) values • 1 for registers • The rest are for memory. • Three operand types: • Immediate, is for constant values • Written with a $ followed by an integer, e.g., $-577 or $0x17 • Register, denote the contents of one of the registers • Its value R[Ea] • Memory, • Mb[Addr] to denote the b-byte value stored in memory starting at address Addr

Operand Forms • Operands can denote immediate (constant) values, register values, or values from memory. • The scaling factor s must be either 1, 2, 4, or 8 • The general form is shown at the bottom of the table.

Practice Problem 1 • Assume the following values are stored at the indicated memory addresses and registers Address Values Register Values --------------------------------------------------- 0x100 0xFF %eax 0x100 0x104 0xAB %ecx 0x1 0x108 0x13 %edx 0x3 0x10C 0x11 Fill the following table: Operand Value Operand Value ---------------------------------------------------- --------------------------------------------------- %eax ________ 9(%eax, %edx) __________ 0x104 ________ 260(%ecx, %edx) __________ $0x108 _________ 0xFC(, %ecx, 4) __________ (%eax) _________ (%eax, %edx, 4) __________ 4(%eax) _________

Practice Problem 1 - Solution • Assume the following values are stored at the indicated memory addresses and registers Address Values Register Values --------------------------------------------------- 0x100 0xFF %eax 0x100 0x104 0xAB %ecx 0x1 0x108 0x13 %edx 0x3 0x10C 0x11 Fill the following table: Operand Value Operand Value ---------------------------------------------------- --------------------------------------------------- %eax _0x100___ 9(%eax, %edx) _0x11_____ 0x104 _0xAB____ 260(%ecx, %edx) _0x13_____ $0x108 _0x108____ 0xFC(, %ecx, 4) _0xFF_____ (%eax) __0xFF____ (%eax, %edx, 4) _0x11_____ 4(%eax) __0xAB___

Data Movement Instructions • MOV classes • movb, movw, movl • Operate on the data size of 1, 2, and 4 bytes, respectively • movs, movz classes • movsbw, movsbl, movswl • Sign-extended • movzbw, movzbl, movzwl • Zero-extended

Data Movement Instructions

Practice Problem 2 • Assume initially that %dh = 0xCD, %eax = 0x98765432 • movb %dh, %al %eax =? • movsbl %dh, %eax %eax = ? • movzbl %dh, %eax %eax = ?

Practice Problem 2- Solution • Assume initially that %dh = 0xCD, %eax = 0x98765432 • movb %dh, %al %eax = 0x987654CD • movsbl %dh, %eax %eax = 0xFFFFFFCD • movzbl %dh, %eax %eax = 0x000000CD

Practice Problem 3 • What’s wrong with each line? • movb $0xF, (%bl) • movl %ax, (%esp) • movw (%eax), 4(%esp) • movb %ah, %sh • movl %eax, $0x123 • movl %eax, %dx • movb %si, 8(%ebp)

Practice Problem 3 - Solution • What’s wrong with each line? • movb $0xF, (%bl) Ans: cannot use %bl as address register 2. movl %ax, (%esp) Ans: mismatch between suffix with register ID 3. movw (%eax), 4(%esp) Ans: cannot have both source and destination be memory address 4. movb %ah, %sh Ans: no register named %sh 5. movl %eax, $0x123 Ans: Cannot have immediate as destination 6. movl %eax, %dx Ans: Destination operand incorrect size 7. movb %si, 8(%ebp) Ans: Mismatch between instruction suffix with register ID.

Example 1 Example 1: int simple(int x) { return x+17; } Complies to: simple: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax // x into %eax addl $17, %eax // x+17 into %eax popl %ebp ret

Example 2 Example 2: int array(int* s, inti) { return s[i]; } Complies to: array: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax movl 8(%ebp), %edx movl (%edx,%eax,4), %eax popl %ebp ret Question: if we changed this to an array of short, could we just change the 4 to 2?

Example 2 Example 2: int array(int* s, inti) { return s[i]; } Complies to: array: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax // i into %eax movl 8(%ebp), %edx // s into %edx movl (%edx,%eax,4), %eax // M[S+4*i] -> %eax popl %ebp ret Question: if we changed this to an array of short, could we just change the 4 to 2?

Example 3 Example 3 short array(short* s, inti) { return s[i]; } Complies to: array: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax movl 8(%ebp), %edx movzwl(%edx,%eax,2), %eax popl %ebp ret Questions: 1) what does the movzwl do? 2) What value would be returned in %eax if the array contained -1?

Example 3 Example 3 short array(short* s, inti) { return s[i]; } Complies to: array: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax // i into %eax movl 8(%ebp), %edx // s into %edx movzwl(%edx,%eax,2), %eax // M[s+i*2] -> %eax popl %ebp ret Questions: 1) what does the movzwl do? 2) What value would be returned in %eax if the array contained -1?

3.5 Arithmetic and Logical Operations Figure 3.7 Integer arithmetic operations. The load effective address (leal) instruction is commonly used to perform simple arithmetic. The remaining ones are more standard unary or binary operations. We use the notation >>A and >>L to denote arithmetic and logical right shift, respectively.

Section 3.5.2 Unary and Binary Operations • Unary operations: inc, dec, neg, not • Binary operations: • Operate on source and destination, storing results in destination • add, sub, imul • xor, or, and • Bitwise operations

Practice Problem 4 Suppose register %eax holds value x and %ecx holds value y. Fill in the table below with formulas indicating the value that will be stored in register %edx for each of the given assembly code instructions: Instruction Result ______________________________________________________________________ leal 6(%eax), %edx _____________ leal (%eax, %ecx), %edx _____________ leal 7(%eax, %eax, 8), %edx _____________ leal 0xA(, %ecx, 4), %edx _____________ leal 9(%eax, %ecx, 2), %edx _____________

Practice Problem 4 - Solution Suppose register %eax holds value x and %ecx holds value y. Fill in the table below with formulas indicating the value that will be stored in register %edx for each of the given assembly code instructions: Instruction Result ______________________________________________________________________ leal 6(%eax), %edx ____6+x______ leal (%eax, %ecx), %edx ___x+y______ leal 7(%eax, %eax, 8), %edx ___7+x+8y___ leal 0xA(, %ecx, 4), %edx ___10+4y____ leal 9(%eax, %ecx, 2), %edx ___9+x+2y___

Practice Problem 5 • Assume the following values are stored at the indicated memory addresses and registers Address Values Register Values --------------------------------------------------- 0x100 0xFF %eax 0x100 0x104 0xAB %ecx 0x1 0x108 0x13 %edx 0x3 0x10C 0x11 Fill the following table: Instruction Destination Value ________________________________________________________ addl %ecx, (%eax) ________ ________ subl %edx, 4(%eax) ________ ________ imul $16, (%eax, %edx, 4) ________ ________ incl 8(%eax) ________ ________ decl %ecx ________ ________ subl %edx, %eax ________ ________

Practice Problem 5 Solution • Assume the following values are stored at the indicated memory addresses and registers Address Values Register Values --------------------------------------------------- 0x100 0xFF %eax 0x100 0x104 0xAB %ecx 0x1 0x108 0x13 %edx 0x3 0x10C 0x11 Fill the following table: Instruction Destination Value ________________________________________________________ addl %ecx, (%eax) _0x100__ __0x100_ subl %edx, 4(%eax) _0x104__ __0xA8__ imul $16, (%eax, %edx, 4) _0x10C_ __0x110__ incl 8(%eax) _0x108__ __0x14_ decl %ecx _%ecx__ __0x0___ subl %edx, %eax _%eax___ _0xFD__

Section 3.5.3: Shift Operations • D=[xn-1,xn-2, …, x0] • Left Shift • SAL, SHL are same • D<<k = [xn-k-1,xn-k-2, …, x0, 0,0,…0] • Dropping off the k most significant bits • Right Shift • SAR: arithmetic right shift • D>>Ak = [xn-1, xn-1, …, xn-1,xn-2, …, xk] • SHR: logical right shift • D>>Lk = [0, 0, …, 0,xn-1,xn-2, …, xk] • Shift Amounts • k is encoded as a single byte, since only shift amounts between 0 and 31 are possible (only the low-order 5 bits of the shift amounts are considered) • Shift amount is given either as an immediate or in the single byte register element %cl

Practice Problem 6 Suppose we want to generate assembly code for the following C function: int shift_left2_rightn(int x, int n) { x << = 2; x >> = n; } The code that follows is a portion of the assembly code that performs the actual shifts and leaves the final value in register %eax. Two key instructions have been omitted. Parameters x and n are stored at memory locations with offsets 8 and 12, respectively to the address in register %ebp. • movl 8(%ebp), %eax // get x • _____________________ // x << =2 • movl 12(%ebp), %ecx // get n • _____________________ // x >> = n

Practice Problem 6 - Solution Suppose we want to generate assembly code for the following C function: int shift_left2_rightn(int x, int n) { x << = 2; x >> = n; } The code that follows is a portion of the assembly code that performs the actual shifts and leaves the final value in register %eax. Two key instructions have been omitted. Parameters x and n are stored at memory locations with offsets 8 and 12, respectively to the address in register %ebp. • movl 8(%ebp), %eax // get x • _sall $2, %eax_____ // x << =2 • movl 12(%ebp), %ecx // get n • __sarl %cl, %eax____ // x >> = n

Example 4 Example 4 void array_set(int* s, inti, int value) { s[i]= value; } Compiles to: array_set: pushl %ebp movl %esp, %ebp // add comments movl 16(%ebp), %ecx // movl 12(%ebp), %edx // movl 8(%ebp), %eax // movl %ecx, (%eax,%edx,4) // popl %ebp ret

Example 4 Example 4 void array_set(int* s, inti, int value) { s[i]= value; } Compiles to: array_set: pushl %ebp movl %esp, %ebp movl 16(%ebp), %ecx // value into %ecx movl 12(%ebp), %edx // i into %edx movl 8(%ebp), %eax // s into %eax movl %ecx, (%eax,%edx,4) // value into memory at (s + 4*i) popl %ebp ret

CS 3843 Computer Organization