Efficient Assembler Programming Guide for Embedded Systems
70 likes | 140 Vues
Learn about optimal assembly coding, speed, and compiler optimization in embedded systems. Utilize registers intelligently for function arguments and data processing with simple and complex examples.
Efficient Assembler Programming Guide for Embedded Systems
E N D
Presentation Transcript
Why Assembly? • Speed • Not affected by compiler optimization
Registers that can be used without saving • r0 • r18-r25 • r25-r27 (X) • r30-r31 (Z) • r1 (must be cleared before returning)
Assembler function arguments • Arguments allocated left to right (r25 to r18) • Even register aligned
Simple assembler example uint32_t subit(uint32_t ul, uint8_t b){ return(ul-b);} #include <avr/io.h> .text .global subitsubit: sub r22, r20 ; subtract b (r20) from ul (r25-r22)sbc r23, r1 ; .. NOTE: gcc makes sure r1 is always 0sbc r24, r1 ; ..sbc r25, r1 ; .. ret .end
More complex example: #include <avr/io.h>; defines the # of cpu cycles of overhead; (includes the ldi r16,byte0; ldi r17,byte1; ldi r18, byte2, ; ldi r19, byte3, and the call _delay_cycles)OVERHEAD = 24; some register aliasescycles0 = 22cycles1 = 23cycles2 = 24cycles3 = 25temp = 19 .text .global delay_cyclesdelay_cycles:;; subtract the overheadsubi cycles0,OVERHEAD ; subtract the overheadsbc cycles1,r1 ; ..sbc cycles2,r1 ; ..sbc cycles3,r1 ; ..brcsdcx ; return if req’d delay too short ;; delay the lsbmov r30,cycles0 ; Z = jtable offset to delay 0-7 cycles com r30 ; ..andi r30,7 ; ..clr r31 ; ..subi r30,lo8 (-(gs(jtable))) ; add the table offsetsbci r31,hi8 (-(gs(jtable))) ; ..ijmp ; vector into table for partial delayjtable: nopnopnopnopnopnopnop;; delay the remaining delayloop: subi cycles0,8 ; decrement the count (8 cycles per loop)sbc cycles1,r1 ; ..sbc cycles2,r1 ; ..sbc cycles3,r1 ; ..brcsdcx ; exit if donenop ; .. add delay to make 8 cycles per looprjmp loop ; ..dcx: ret .end void delay_cycles(uint32_t cpucycles);