190 likes | 295 Vues
This document presents a comprehensive overview of back-end design principles for the CComp project by Zhaopeng Li from the Software Security Lab. It discusses critical elements, including assembly language specifics for x86 architecture, low-level intermediate languages, and design points for future development. Topics include data representation, calling conventions, state relations in operational semantics, and program logic based on abstract machines. This outline aims to lay the groundwork for understanding assembly's role in compiler design and program verification.
E N D
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009
Outline • Design Points • Assembly Language : “x86” • Low-level Intermediate Language • Future Work
Design Points • Assembly Language • Target : SCAP with x86 abstract machine; • Maybe next version the program logic is changed; • Or another machine will be used. • Low-level Intermediate Language • Hide some machine-specific things; • Note that, this level can be just a helper to generate code and proof.
Some Topics about “x86” • Data Representation • 32-bit vs “fake” 32-bit • Don’t care how to store the data as bits. • Integer : 4 bytes • Pointer : 4 bytes • Data Alignment • Callee-saved Registers • EBX, ESI, EDI, EBP
Some Topics about “x86” (cont.) • Calling convention: • Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack; • Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function; • Return value is stored in the register EAX ; • Caller cleans up the stack (parameter).
Some Topics about “x86” (cont.) Prolog (typical) Epilog(typical) mov ebp, esp ;reset the stack to ; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function _function: push ebp ;store the old base pointer mov esp, ebp ;make the base ; pointer point to the current stack; location sub x, esp ; x is the size, in bytes leave ret enter x, 0 esp … local variables local variables ebp old ebp old ebp esp old eip old eip old eip esp parameters parameters parameters … … … ebp ebp after the return func. entry after Stack frame setup
Assembly Abstract Machine “m86” • Code Heap (C) • Code storage, • Unchanged during execution • Machine State • Memory (M) • Register File (R) • Instruction Pointer (eip), • current instruction c = C(eip) • Or just use instruction sequence (I)
Assembly Language : “x86” • “AT&T-syntax” • Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp • FReg. fr ::= sf | zf • Int. b ::= n (integer) • Instr. i ::= add r1, r2 | addi n, r | sub r1, r2 | subi n, r | mul r1, r2 | muli n, r | mov r1, r2 | movi n, r | movs r1, n(r2) | movl n(r1), r2 | push r | pop r | cmp r1, r2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r
Program Logic • Based on SCAP • Specification (p, g) • p : State -> Prop • g : State -> State -> Prop • Inference Rules • Well-formed program • Well-formed basic block • Well-formed instruction
Main Objects • Code Generation • Minimize the proof size • Eg. the temporary result should be put in register not on the stack • Assertion • Building (p, g) for each basic block • Generating (p, g) for each program point • Proof • Generating proof for functions/basic blocks • (reusing the proof of VC in source level)
Assertion Relationship f : {(p’, g)} f : {p} //{q} Basic block1 Basic block1 L1 : {p1} L1 : {(p’1,g1)} Basic block2 Basic block2 p’ = trans(p) /\ paramp/\stack-regp g = trans(q) /\ callee-saved-regg /\ stackg p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1 g1 = ? Intermediate Language x86 Assembly Lanuage
Figure Out G R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4 R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4 g0 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R1(ebp) = R0(esp) /\ R1(esp) = R0(esp) R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp) /\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp) g1 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 g1 R2 R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12 R’(ebp) = M2(R2(ebp)) /\ R’(esp)=R1(esp)+20 L1 : {g1} R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp) /\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)-12 Basic block2 g2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Potential Benefits • Hide some machine-specific things; • Some optimizations could be done (optional); • Make the implementation simple and reusable • (*Note that, this level is just a helper to generate code and proof.*) • Only add codes for translating from this level when targeting different assembly logic
The Language • Loc. l ::= r | s • Int. o,b ::= n (integer) • Slot. s ::= local(o) | incoming(o) | outgoing(o) • Reg. r ::= r1 | r2 | r3 | … //infinite pseudo-registers • Instr. i ::= bop(bop, l1,l2, l) | uop(uop, l1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l1, cmp,l2, btrue) • BinOp. bop::= add | sub | mul | … • UnOp. Uop::= minus | … • Comp. cmp::= gt | ge | eq | ne | lt | le
Code Generation (optional) • Do some optimizations which do no affect proof, such as: • Branch tunneling • Dead code elimination • Future optimizations • Other low-level optimizations may be done here