Assembly language programming

Assembly language programming • Learning assembly language programming will help understanding the operations of the microprocessor • To learn: • Need to know the functions of various registers • Need to know how external memory is organized and how it is addressed to obtain instructions and data (different addressing modes) • Need to know what operations (or the instruction set) are supported by the CPU. For example, powerful CPUs support floating-point operations but simple CPUs only support integer operations

How to learn programming • C –Concept • L– Logic thinking • P– Practice • Concept – we must learn the basic syntax, such as how a program statement is written • Logic thinking – programming is problem solving so we must think logically in order to derive a solution • Practice – write programs

Assembly Program • The native language is machine language (using 0,1 to represent the operation) • A single machine instruction can take up one or more bytes of code • Assembly language is used to write the program using alphanumeric symbols (or mnemonic), eg ADD, MOV, PUSH etc. • The program will then be assembled (similar to compiled) and linked into an executable program. • The executable program could be .com, .exe, or .bin files

Flow of program development Program .asm Object file .obj Executable file .exe link Assemble

Example • Machine code for mov AL, 00H • B4 00 (2 bytes) • After assemble, the value B400 will be stored in the memory • When the program is executed, then the value B400 is read from memory, decoded and carry out the task

Assembly Program • Each instruction is represented by one assembly language statement • The statement must specify which operation (opcode) is to be performed and the operands • Eg ADD AX, BX • ADD is the operation • AX is called the destination operand • BX is called the source operand • The result is AX = AX + BX • When writing assembly language program, you need to think in the instruction level

Example • In c++, you can do A = (B+C)*100 • In assembly language, only one instruction per statement A = B ; only one instruction - MOVE A = A+C ; only one instruction - ADD A = A*100 ; only one instruction - Multiply

Format of Assembly language • General format for an assembly language statement • Label Instruction Comment • Start: Mov AX, BX ; copy BX into AX Start is a user defined name and you only put in a label in your statement when necessary!!!! The symbol : is used to indicate that it is a label

8086 Software Model

Software model • In 8086, memory is divided into segments • Only 4 64K-byte segments are active and these are: code, stack, data, and extra • When you write your assembly language program for an 8086, theoretically you should define the different segments!!! • To access the active segments, it is via the segment register: CS (code), SS (stack), DS (data), ES (extra) • So when writing assembly language program, you must make use of the proper segment register or index register when you want to access the memory

Registers • In assembly programming, you cannot operate on two memory locations in the same instruction • So you usually need to store (move) value of one location into a register and then perform your operation • After the operation, you then put the result back to the memory location • Therefore, one form of operation that you will use very frequent is the store (move) operation!!! • And using registers!!!!!

Example • In C++ A = B+C ; A, B, C are variables • In assembly language A,B, C representing memory locations so you cannot do A = B+C • MOV AL, B ; move value of B into AL register • ADD, AL, C ; do the add AL = AL +C • MOV A, AL ; put the result to A

Data registers • AX, BX, CX,and DX – these are the general purpose registers but each of the registers also has special function Example • AX is called the accumulator – to store result in arithmetic operations • Registers are 16-bit but can be used as 2 8-bit storage • Each of the 4 data registers can be used as the source or destination of an operand during an arithmetic, logic, shift, or rotate operation. • In some operations, the use of the accumulator is assumed, eg in I/O mapped input and output operations

Data register • In based addressing mode, base register BX is used as a pointer to an operand in the current data segment. • CX is used as a counter in some instructions, eg. CL contains the count of the number of bits by which the contents of the operand must be rotated or shifted by multiple-bit rotate • DX, data register, is used in all multiplication and division, it also contains an input/output port address for some types of input/output operations

Pointer and index registers • Stack – is used as a temporary storage • Data can be stored by the PUSH instruction and extracted by the POP instruction • Stack is accessed via the SP (Stack Pointer) and BP (Base Pointer) • The BP contains an offset address in the current stack segment. This offset address is employed when using the based addressing mode and is commonly used by instructions in a subroutine that reference parameters that were passed by using the stack

Pointer and Index Register • Source index register (SI) and Destination index register (DI) are used to hold offset addresses for use in indexed addressing of operands in memory • When indexed type of addressing is used, then SI refers to the current data segment and DI refers to the current extra segment • The index registers can also be used as source or destination registers in arithmetic and logical operations. But must be used in 16-bit mode

Data types • Data can be in three forms: 8-bit, 16-bit, or 32-bit (double word) • Integer could be signed or unsigned and in byte-wide or word-wide • For signed integer (2’s complement format), the MSB is used as the sign-bit (0 for positive, 1 for negative) • Signed 8-bit integer 127 to –128, • For signed word 32767 to –32768 • Latest microprocessors can also support 64-bit or even 128-bit data • In 8086, only integer operations are supported!!!

A sample program .code ; indicate start of code segment .startup ; indicate start of program mov AX, 0 mov BX, 0000H mov CX, 0 mov SI, AX mov DI, AX mov BP, AX END ; end of file The flow of the program is usually top-down and instructions are executed one by one!!!

Assembly programming In general, an assembly program must include the code segment!! Other segments, such as stack segment, data segment are not compulsory There are key words to indicate the beginning of a segment as well as the end of a segment. Just like using main(){} in C++ Programming Example DSEG segment ‘data’ ; define the start of a data segment DSEG ENDS ; defines the end of a data segment Segment is the keyword DSEG is the name of the segment Similarly key words are used to define the beginning of a program, as well as the end.

Assembly language programming Example CSEG segment ‘code’ START PROC FAR ; define the start of a program (procedure) RET ; return START ENDP ; define the end of a procedure CSEG ends End start ; end of everything Different assembler may have different syntax for the definition of the key words !!!!! Startis just a name it could be my_prog, ABC etc

Stacksg segment ‘stack’ …. ; define the stack segment Stacksg ends Datasg segment …… ; declare data inside the data segment Datasg ends Codesg segment ‘code’ Main proc far ; assume ss:stacksg, ds: datasg, cs:codesg mov ax, datasg mov ds, ax …. mov ax, 4c00H int 21H Main endp Codesg ends end main More sample End of everything

Definitions To declare a segment, the syntax is: segment_nameSEGMENT class Example Stacksgsegment (this statement is used in previous slide)

Definition • ‘data’, ‘code’ – class entry. Is used to group related segments when linking. The linker automatically groups segments of the same class in memory • PROC – define procedures (similar to a function) inside the code segment. Each procedure must be identified by an unique name. At the end of the procedure, you must include the ENDP

Definitions FAR – is related to program execution. When you request execution of a program, the program loader uses this procedure as the entry point for the first instruction to execute. Assume – to associate, or to assign, the name of a segment with a segment register In some assembler, you need to move the base address of a segment directly into the segment register!!! END – ends the entire program and appears as the last statement. Usually the name of the first or only PROC designated as FAR is put after END

Syntax of a simple assembly language program • If you are doing something simple then you do not need to define the segment • Everything will be stored in the code segment

start: mov DL, 0H ; move 0H to DL mov CL, op1 ; move op1 to CL mov AL, data ; move data to AL step: cmp AL, op1 ; compare AL and op1 jc label1 ; if carry =1 jump to label1 sub AL, op1 ; AL = AL –op1 inc DL ; DL = DL+1 jmp step ; jump to step label1: mov AH, DL ; move DL to AH HLT ; Halt end of program data db 45 ; define a variable called data op1 db 6 ; define a variable called op1

Assembler for 8086 WASM – a freeware can be download from internet (http://user.mc.net/~warp/software_wasm.html) Emu8086 (http:// www.emu8086.com) – there is a trial version but it does not support all the features such as interrupt The emu8086 consists of a tutorial and the reference for a complete instruction set Keil – www.keil.com

Defining data in a program Data is usually stored in the data segment You can define constants, work areas (a chunk of memory ) Data can be defined in different length (8-bit, 16-bit) 8-bit then use DB 16-bit then use DW The definition for data: [name] Dn expression ; Dn is either DB or DW Name – a program that references a data item by means of a name. The name of an item is otherwise optional Dn – this is called the directives. It defines length of the data Expression – define the values (content) for the data

Examples for data FLDA DB ? ; define an uninitialized item called FLDA 8-bit FLDB DB 25 ; initialize a data to 25 Define multiple data under the same name (like an array) FLDC DB 21, 22, 23, 34 ; the data are stored in adjacent bytes FLDC stores the first value FLDC + 1 stores the second value You can do mov AL, FLDC+3

Example for data definition DUP – duplicate DUP can be used to define multiple storages DB 10 DUP (?) ; defines 10 bytes not initialize DB 5 DUP (12) ; 5 data all initialized to 12 String : DB ‘this is a test’ EQU – this directive does not define a data item; instead, it defines a value that the assembler can use to substitute in other instructions (similar to defining a constant in C programming or using the #define ) factor EQU 12 mov CX, factor

Assembly Program • Assembly language should be more effective and it will take up less memory space and run faster • In real-time application, the use of assembly program is required because program that is written in a high-level language probably could not respond quickly enough • You can also put assembly codes into your C++ program in order to reduce the execution time!!!!

Assembly language programming • The syntax for different microprocessor may be different but the concept is the same so once you learn the assembly programming for one microprocessor, you can easily program other kinds of system • For example, programming the 8051 series is very similar to the 8086

Addressing Modes • Function of the addressing modes is to access the operands • Available modes (9 modes): register addressing, immediate addressing, direct addressing, register indirect addressing, based addressing, indexed addressing, based indexed addressing, string addressing, and port addressing • Addressing modes provide different ways of computing the address of an operand

Why addressing mode is important? • In c++, you can define an array, or a variable • int x[10], y, *z; • Then to access different elements, you can do • Z = x ; • *(x+2); • x[0] = y How this can be done using assembly language programming? This is via different addressing modes!!!!

Register addressing mode • The operand to be accessed is specified as residing in an internal register of the 8086 • Eg MOV AX, BX • Move (MOV) contents of BX (the source operand), to AX (the destination operand) • Both operands are in the internal registers

Pay attention to the value of IP and content of AX, BX

Immediate addressing mode Source operand is part of the instruction Usually immediate operands represent constant data The operands can be either a byte or word e.g MOV AL, 15 15 is a byte wide immediate source operand Or it could be MOV AL, #15 The immediate operand is stored in program storage memory (i.e the code segment) This value is also fetched into the instruction queue in the BIU No external memory bus cycle is initiated!

Direct addressing mode • Move a byte or word between a memory location and a register • the locations following the instruction opcode hold an effective memory address (EA) instead of data • The address is a 16-bit offset of the storage location of the operand from the current value in the data segment register • Physcial address = DS + offset • The instruction set does not support a memory-to-memory transfer!

Direct addressing • Data is assumed to be stored in the data segment so DS is used in calculating the physical address!!! • External memory bus cycle is needed to do the read • Example of direct addressing: mov AL, var1 • Where Var1 can be regarded as a variable

Register indirect addressing mode • Transfer a byte or word between a register and a memory location addressed by an index or base register • Example MOV AL, [SI] • SI – index register • The symbol [] always refer to an indirect addressing • The effective address (EA) is stored either in a pointer register or an index register • The pointer register can be either base register BX or base pointer register BP • The index register can be source index register SI, or destination index register DI • The default segment is either DS or ES

Register indirect addressing • Eg MOV AX, [SI] • Value stored in the SI register is used as the offset address • The segment register is DS in this example • Meaning of the above is to move the data stored in the memory location : DS + SI to the AX register • In register indirect addressing mode, the EA (effective address) is a variable and depends on the index, or base register value • Eg mov [BX], CL • Which segment register will be used for the above operation

According to the memory map The result of the operation Mov [BX], CL will result in what??? If CL = 88 and BX = 1233H and DS =0H Physical address = DS + BX = 01233H

Different Addressing modes

How to move the address of a variable to a register ? • When using indirect addressing, such as • MOV AL, [SI] • SI is an address so how can we initialize SI? • This is by the instruction called LEA (Load Effective Address) • LEA is similar to x = &y in C++ • Syntax of LEA: • LEA SI, ARRAY

Base-plus-index addressing mode • Move a byte or word between a register and the memory location addressed by a base register (BP or BX) plus an index register (DI or SI) • Physical address of the operand is obtained by adding a direct or indirect displacement to the contents of either base register BX or base pointer register BP and the current value in DS and SS, respectively. • Eg MOV [BX+SI], AL • Move value in AL to a location (DS+BX+SI) • If BP is used then use SS instead of DS

Application of Base + index • The base register (BX) often holds the beginning location of a memory array, while the index register (SI) holds the relative position of an element in the array • Change the value of SI then you can access different elements in the array

Register relative addressing mode • Move a byte or word between a register and memory location addressed by an index or base register plus a displacement • Eg MOV AL, ARRAY[SI] • EA = value of SI + ARRAY • Physical address = EA + DS • Eg mov AX, [BX+4] • Eg mov AX, array[DI+3] • This is similar to the base-plus-index

Assembly language programming