ECE 425

ECE 425 Subroutines and Stacks

Subroutines • Separate, independent module of program, performs a specific task • shortens code, provide reusable “tools” • High-level languages typically have libraries of subroutines. • The objective is to avoid “reinventing the wheel.”

Subroutine Modularity Main Program Subroutine 2 Subroutine 3 Subroutine 1 Subroutine 2.1 Subroutine 3.1 Sub. 3.2 Subroutine 3.1.1

Subroutine Techniques • Techniques • Keep them short • Make them reusable (more on this coming) • Main will frequently do nothing but call subroutines. • Good subroutine Practice • Independence: does not depend on other code, can be used in many programs. • Registers: stores/restores key registers upon entrance/exit using Push and Pull commands (where?) • Data and code location independent • Use local variables, do not use hardcoded addresses.

Nothing But Subroutines • Well written code will often consist of nothing but subroutines at the top: • main BL GetPosition BL CalcOffset BL DisplayData • Using BL presumes the subroutine code is within 32MB of the calling routine. This is almost always a fair assumption.

The Stack • Calls to subroutines require use of the stack. • The stack is a piece of memory dedicated to temporary storage of run time variables • What type of memory? RAM, ROM, DRAM, SRAM, register file…? • The stack is always organized as a LIFO queue. • Last In, First Out • PUSH an element onto the stack. • POP one off. • Mostly used for saving/restoring machine register.

An example of stack operations

Load/Store Architecture • Being a RISC machine, ARM processor does not have dedicated PUSH/POP instructions. • Uses LDR/STR instead. • Assembler may translate PUSH/POP mnemonics to load/store opcodes • R13 is normally designated as the stack pointer (sp). • Points to top of stack. • Top can be either the next empty location or the last filled one. More flexibility than other architectures.

Push/Pull Multiple • Going to a subroutine may require saving/restoring a lot of registers. • ARM has LDM/STM instruction to push/pull multiple registers with one opcode. • Actual reads/writes are still done sequentially. Can’t push/pull more than one register on any given clock cycle. • Saves code space, saves fetching/decoding multiple load/store opcodes.

LDM/STM • Load/Store multiple ops can transfer between 1 and 16 registers to/from memory. • Only 32 bit words, not valid for bytes or half words • Order of registers to be transferred can not be specified: reordering in transfer list will be ignored • Lowest register number will be transferred to/from lowest address • Contents of base register will be used to determine lowest address

LDM/STM Addressing Modes • LDM and STM have special operating modes: • IA: increment after • IB: increment before • DA: decrement after • DB: decrement before • Not for use with other instructions • PUSH/POP mnemonics are the same as LDMIA and STMDB.

LDMIA Example • Load r0,r2,r4 with memory data starting at address 0xBEEF0000. • LDR r1,= 0xBEEF0000 LDMIA r1, {r0, r2, r4} ;mem32[r1]  r0 ;mem32[r1 + 4]  r2 ;mem32[r1 + 8]  r4 • LDMIA r1, {r4, r0, r2} ; EXACTLY the same • r1 will remain unchanged. Effective addresses are generated on the fly.

STMDB Example • Save values of r0,r2,r4 to the stack with top byte address 0xBEEF0000. • LDR r1,= 0xBEEF0000 STMDB r1, {r0, r2, r4} ; r0 mem32[r1-4] ; r2 mem32[r1 - 8] ; r4 mem32[r1 - 12] • STMDB r1, {r4, r0, r2} ; EXACTLY the same • r1 will remain unchanged. Effective addresses are generated on the fly.

What’s the Point? • Can be used for block data transfers. • Mostly used for context switches. • Most common context switch is going to a subroutine. • Save the current processor state so it can be restored after the subroutine. • Can also be interrupt (to be covered shortly) or switching users/tasks in multi-threaded applications

What If You Want Base Register to Change? • Previous examples all leave base address pointer alone. • Effective address can be saved in it. • Just append ! to base register. • LDMIA r10!, r0 ; r10 will be incremented by 4 (one word) • LDMIA r10! {r0-r3} ; r10 will be incremented by 16

Up/Down, Empty/Full • Most processors have rigid definitions for stack ops. • e.g., stacks always decrement on push, increment on pop. • e.g., stack pointer always points to next address to be pushed. • ARM leaves that up to the developer. • Stack can grow up or down (i.e., address could increase or decrease) • Stack pointer can point to next address to be filled or last address that was filled.

What kind of stack it is?

Up/Down, Empty/Full Increasing 1018 r5 1014 r5 r1 1010 r1 r0 100C r0 r5 1008 r1 r5 1004 r0 r1 1000 r0 STMIA r9!, {r0, r1, r5} r9 final value: 1018 STMIB r9!, {r0, r1, r5} r9 final value: 1018 STMDA r9!, {r0, r1, r5} r9 final value: 1000 STMDB r9!, {r0, r1, r5} r9 final value: 1000 In all cases, r9 starts out with 100C. If the ! was left out, r9 would always be 100C.

Up/Down, Empty/Full • Stack can be full or empty • Full – stack pointer (sp) points to last address written to • Empty - stack pointer points to next available address • Stack can be ascending or descending • Ascending – memory address goes up as data push in • Descending – memory address goes down as data push in • How to use the stack? • Normally, LDM/STM work together • Store register values before executing subroutine tasks • Load back original register values after subroutine tasks

How to use the stack operations? • A stack can be • Full descending (FD) • Full ascending (FA) • Empty descending (ED) • Empty ascending (EA) • Select a type, and be consistent.

Subroutines • The general structure of a subroutine in a program is:

Using stack in subroutines … BL SUB1 … SUB1 STMFD sp!,{r0-r2,r5} … ;r0-r2, r5 are used in here … ;may be altered by SUB1 LDMFD sp!,{r0-r2,r5} MOV pc, lr … BL SUB1 … SUB1 STMFD sp!,{r0-r2,r5} … BL SUB2 ;calls SUB2 in SUB1 … ; LDMFD sp!,{r0-r2,r5} MOV pc, lr anything wrong?

Subroutines Nesting • Subroutine may call another subroutine or itself • A routine that can call itself is “recursive.” • Example: factorial • Factorial(N) = N*factorial(N-1) • Return address is stored in r14, which is transferred to pc when routine terminates. • But if a subroutine is called recursively (or calls other subroutines), what happens?

Subroutine Nesting • Main calls subroutine A. • Return address to main is stored in r14 • Subroutine A calls subroutine B. r14 is over written with new return address to A. • Subroutine B finishes, r14 is transferred to PC. • PC now points where? To the subroutine A. • Start executing subroutine A again. • Solution: • Use the stack. • Push link register before the next BL. • Pop it upon return. • Always push/pop link register value in subroutine

Previous Example … BL SUB1 … SUB1 STMFD sp!,{r0-r2,r5, lr} … BL SUB2 ;calls SUB2 in SUB1 … ; LDMFD sp!,{r0-r2,r5, pc} MOV pc, lr ;no longer necessary, why? SUB2 SUB1 : : BL SUB1 : : : : : : : MOV pc, lr STMFD sp!,{regs,lr} : BL SUB2 : LDMFD sp!,{regs,pc}

Parameter Passing • Usually a subroutine needs to operate on some data that has been set by the calling program. • For flexibility, it would be poor practice to ‘hardwire” the passed parameters. • For example, square root function that only calculates the square root of 8 • These data can be passed to the subroutine by “Value” or by “Reference.”

Call by Value • Uses CPU registers: subroutine operates on the ‘values’ of the associated registers. • Values used by the routine are “parameters.” • Call by value means the parameters are held in certain registers. • Any/all registers r0 – r12 may be used. • ARM Application Procedure standard calls for them to be held in r0 – r3. • If that’s not enough, use the stack, too. • Calling program sets the registers, subroutine uses values found there.

Saving register values • If the subroutine is going to operate on a register should that register be pushed before starting the subroutine? • A subroutine may need registers for temporary storage, intermediate variables. • Solution is to push any registers the routine will use, pop them when it’s done. • Use SDM to save machine state, including link register. • Use LDM to restore machine state, including link register (to program counter).

Call by Reference • If the parameters are in large amount, stored in memory, use Call-by-reference. • For example: get average of 100 grades listed in a table. • Used to manipulate data in memory. • Data are stored in memory consecutively • The start address of the data block is passed to subroutine • How are memory addresses passed? • By putting them in registers!

Call by Reference Example startaddr EQU0x5000 ;start address of block to be scrambled stopaddrEQU0x6000 ;stop scrambling here LDR sp, =SRAM_BASE LDRr0, = startaddr LDRr1, =stopaddr BLscramble What will happen if sp is not initialized?

ARM APC • Application Procedure Call Standard sets guidelines for subroutine behavior. • Allows multiple programmers to write routines that won’t corrupt other segments • Says some registers may be changed, others must be preserved. • Requires stack to be eight-byte aligned (top of stack address ends in 000) and full descending. Stack may be used for parameters and to preserve other register values that the routine would otherwise corrupt.

APC Registers r0 r4 r12 r13 r1 r5 r14 r12 can be changed by the routine and does not need to be restored. r2 r6 r15 r3 r7 Registers 13, 14 and 15 are the stack pointer, link register and program counter. Their functioning does not change in subroutines. r8 Pass parameters via r0 – r3. r9 r10 r11 Register r4 –r 11 values must be preserved.

ECE 425

ECE 425

Presentation Transcript

ECE 425 - VLSI Circuit Design

ECE 425 - VLSI Circuit Design

ECE 425

ECE 425

ECE 425

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

ECE 428/CS 425 Distributed Systems

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

ECE 425

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013

Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013