1 / 22

The Assembly Process

The Assembly Process. The Assembly Process. Assembler has to perform two major tasks: translate assembly language code into machine code assign addresses for all symbolic label s This is generally done on a line by line basis Often two or more passes required

zorion
Télécharger la présentation

The Assembly Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Assembly Process

  2. The Assembly Process • Assembler has to perform two major tasks: • translate assembly language code into machine code • assign addresses for all symbolic labels • This is generally done on a line by line basis • Often two or more passes required • The programmer must determine what the initial state of memory should be to execute the program. • Includes both data and text areas • although program itself may initialise data areas, such initialisation should occur at beginning of program.

  3. Machine Code Generation • Assembling a program entails translating the assembly language into binary machine code • This requires more than simply mapping assembly instructions to machine instructions • Each instruction is bound to an address • Labels are bound to addresses • Assembly instructions which refer to labels generate machine instructions which contain the label's address • Pseudo-instructions are translated into one or more machine instructions

  4. The Symbol Table • The assembler scans the source code and generates the appropriate bit string for each line encountered • The assembler must remember • what memory locations have been allocated • to which address each label is bound • A symbol table is a list of (label, address) pairs • When the data and text segments have been generated, they are stored as an executable file • The file is used by a program called the loader to initialize memory to the appropriate state before execution

  5. Instructions • The .text directive tells the assembler that the lines which follow are instructions. • By default, the text segment starts at 0x00000 • Different for different processors • In some cases, a symbol may not have an assigned address yet when the assembler scans the line where it belongs • A second pass through the code can update instructions containing unresolved labels • Maintain a list of addresses in which each unresolved label appears • When the label is added to the symbol table, all locations in the corresponding list are updated to hold the address associated with the label

  6. Pseudo-instructions • Some assembly languages include Pseudo-Instructions (not WRAMP). • Pseudo-instructions are not directly implemented in the processor • Pseudo instructions map to (generally) one or two actual processor instructions • The assembler makes the substitution • E.g. load high instruction could be implemented as load and shift

  7. Jump target calculation • The jump instruction has two forms • Direct, for j and jal • Register direct for jr and jalr • jr and jalr specify a register containing the address to be loaded into the PC • j andjalspecify most of the address of the target within the instruction. • However, their range is limited by the space allocated in the instruction format f e d c b a 9 8 7 6 5 4 3 2 1 0

  8. PC opcode Jump target bits 00 Jump Target Calculation • Jump Instruction has 20 bits allocated for the address. 12 bits are opcode and registers. • This limits the jump to 1M address range. • WRAMP solution • Address space is 20 bits = 1M words = 4 MB • Alternative solution (MIPS -32bit, byte addressable) • The bottom 2 bits are always zero - word boundaries • The highest-order bits of the target are taken from the address currently stored in the program counter

  9. Branch Offset • In machine code, the target address in a branch must be specified as an offset from the address of the branch. • During execution, this offset is simply added to the program counter to fetch the next instruction • PC contains the address of the next instruction • Offset is measured in words, not bytes even on byte addressable architectures PC_NEW = offset*1 + PC_OLD • To calculate the offset, the assembler uses the formula: offset = target instruction address – (branch instruction address + 1)

  10. Branch Offset Calculation • The offset is stored in the instruction as a word offset rather than a byte offset. • Instructions are only stored at word boundaries • WRAMP uses word addresses, but for byte addressable processors both target and branch instruction have at least two bits of the address as zero • An offset maybe negative • If the target instruction preceded the branch instruction • The offset is stored in the 20-bit Offset field • This means the branch can jump 219 instructions before or after the current address • On other architectures this may be a more serious limitation.

  11. offset in bytes (main = 0x00000) 0x00000 – (0x0001A + 1) = - 27 stored offset ffe5 = -27 machine code orignal assembly code instruction address line number in source file Branch Offset Calculation • An entry in the WRAMP instruction list [0x0001A]0xb02fffe5bne $2, -27 26: bnez $2, main

  12. Program relocation • It is possible that program modules are developed separately by individual programmers. When these programs are to be loaded into memory they should not be assigned overlapping memory space. • To handle this problem, the modules have to be relocated • relative addresses are relocatable • Any absolute references must be "fixed" by the loader • Use a logical base address known at load time • Absolute addresses are stored as offsets from this TBD base

  13. memory From source to executable high-level source code lib obj asm exe asm obj linker loader assembler compiler

  14. An Example of Assembling Code • .data • a1: .word 3 • a2: .word 16 • a3: .word 5 • .text • .global main • main: • la $6, a2 • loop: • lw $7, 1($6) • lw $10, 0($6) • mult $9, $10, $7 • beqz $9, loop • j loop • syscall

  15. .data • a1: .word 3 • a2: .word 16 • a3: .word 5 • .text • .global main • main: • la $6, a2 • loop: • lw $7, 1($6) • lw $10, 0($6) • mult $9, $10, $7 • beqz $9, loop • j loop • syscall Some examples of assembling code Symbol Table • symbol address • a1 00007 • a2 00008 • a3 00009 • main 00000 • loop xxxxx • Memory map of data section • address contents • 00007 0000 0003 • 00008 0000 0010 • 00009 0000 0005

  16. address contents 0x00000 c6000008 (la) 0x00001 87600001 (lw) 0x00002 8a600000 (lw) 0x00003 09a40007 (mult) 0x00004 a09xxxxx (beqz) 0x00005 400xxxxx (j) 0x00006 200d0000 (syscall) Translate to Machine Code la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall

  17. address contents 0x00000 c6000008 (la) 0x00001 87600001 (lw) 0x00002 8a600000 (lw) 0x00003 09a40007 (mult) 0x00004 a09xxxxx (beqz) 0x00005 40000001 (j) 0x00006 200d0000 (syscall) Resolve Symbols la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall

  18. address contents 0x00000 c6000008 (la) 0x00001 87600001 (lw) 0x00002 8a600000 (lw) 0x00003 09a40007 (mult) 0x00004 a09ffffc (beqz)(-4) 0x00005 40000001 (j) 0x00006 200d0000 (syscall) Resolve Relative References la $6, a2 lw $7, 1($6) lw $10, 0($6) mult $9, $10, $7 beqz $9, loop j loop syscall 0x00001 - (0x00004+1) = -4 = 0xfffc

  19. Summary • Have been looking at the relationship between high level languages and machine code “C” wcc WRAMP assembler wasm wlink WRAMP Machine code • WRAMP architecture used as the example • instructions based on 3 operand format • example of register-register architecture

  20. Summary- WRAMP processor • Register file provides small fast memory • Instructions for • Arithmetic and logic instructions • Test Instructions • Flow of control Instructions • Memory Instructions

  21. Summary – WRAMP processor • WRAMP Instruction set has four addressing modes • Register Direct (e.g. add $3, $4, $5) • Immediate (e.g. addi $3, $4, 0x12) • Base displacement (e.g. lw $3, 8($5) • PC relative (e.g. beqz $3 0x123)

  22. Summary - Compilation/Assembly • wcc cross compiler used as example; see exercise 2 • Compiles with WRAMP conventions • parameter passing • register save conventions • Uses a stack frame for each procedure which contains space to: • save parameters • save local variables • use as a register save area (e.g. $ra) • Machine code output of compilation process

More Related