570 likes | 748 Vues
Bit -level Types. for. High -level Reasoning. Ranjit Jhala Rupak Majumdar. The Problem. mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte = (p & 0xFFFFF000)>> 12; b = tab[pte] & 0xFFFFFFFC; o = p & 0xFFC; return m[(b+o)>>2]; }.
E N D
Bit-level Types for High-level Reasoning Ranjit Jhala Rupak Majumdar
The Problem mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte = (p & 0xFFFFF000)>> 12; b = tab[pte] & 0xFFFFFFFC; o = p & 0xFFC; return m[(b+o)>>2]; } • Bit-level operators in low-level systems code • Why ? • Interact with hardware • Reduce memory footprint
The Problem mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte = (p & 0xFFFFF000)>> 12; b = tab[pte] & 0xFFFFFFFC; o = p & 0xFFC; return m[(b+o)>>2]; } • Bit-level operators in low-level systems code • Inscrutableto humans, optimizers, verifiers
31 1 p Whats going on ? 32 mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
20 11 1 31 1 p pte Whats going on ? 20 mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 12 20
20 11 1 p pte tab[pte] Whats going on ? mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 12 20 32
12 10 2 20 11 1 20 10 1 1 p pte o b 30 2 Whats going on ? mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 12 20
20 10 2 20 10 1 1 p pte o b Whats going on ? mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 12 20 30 2
20 10 2 20 10 1 1 p pte o b Q: How to infer complex information flow to understand, optimize, verify code ? mget (u32 p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 12 20 30 2
Plan • Motivation • Approach
20 10 2 20 10 1 1 12 20 b : {addr,30}{;,2} p : {idx,20}{addr,10}{wr,1}{rd,1} o : {;,20}{addr,10}{;,2} pte : {;,12}{idx,20} p 30 2 pte b o Our approach: (1) Bit-level Types Bit-level Types Sequences of {name,size} pairs
20 10 2 20 10 1 1 12 20 b : {addr,30}{;,2} p : {idx,20}{addr,10}{wr,1}{rd,1} p o : {;,20}{addr,10}{;,2} pte : {;,20}{idx,10} 30 2 pte o b Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){
20 10 2 20 10 1 1 12 20 b : {addr,30}{;,2} p : {idx,20}{addr,10}{wr,1}{rd,1} p o : {;,20}{addr,10}{;,2} pte : {;,20}{idx,10} 30 2 pte o b Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){
20 10 2 20 10 1 1 12 20 p : {idx,20}{addr,10}{wr,1}{rd,1} o : {;,20}{addr,10}{;,2} p pte : {;,20}{idx,10} b : {addr,30}{;,2} 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx;
20 10 2 20 10 1 1 12 20 b : {addr,30}{;,2} p : {idx,20}{addr,10}{wr,1}{rd,1} p o : {;,20}{addr,10}{;,2} pte : {;,20}{idx,10} 30 2 pte o b Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx;
20 10 2 20 10 1 1 12 20 o : {;,20}{addr,10}{;,2} p b : {addr,30}{;,2} pte : {;,20}{idx,10} p : {idx,20}{addr,10}{wr,1}{rd,1} 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr;
20 10 2 20 10 1 1 12 20 pte : {;,20}{idx,10} b : {addr,30}{;,2} p p : {idx,20}{addr,10}{wr,1}{rd,1} o : {;,20}{addr,10}{;,2} 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr;
20 10 2 20 10 1 1 12 20 pte : {;,20}{idx,10} p : {idx,20}{addr,10}{wr,1}{rd,1} o : {;,20}{addr,10}{;,2} p b : {addr,30}{;,2} 30 2 pte o b Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr; o.addr=p.addr;
20 10 2 20 10 1 1 12 20 pte : {;,20}{idx,10} b : {addr,30}{;,2} p p : {idx,20}{addr,10}{wr,1}{rd,1} o : {;,20}{addr,10}{;,2} 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr; o.addr=p.addr;
20 10 2 20 10 1 1 12 20 o : {;,20}{addr,10}{;,2} b : {addr,30}{;,2} pte : {;,20}{idx,10} p : {idx,20}{addr,10}{wr,1}{rd,1} p 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr; o.addr=p.addr; return m[b.addr+o.addr];
20 10 2 20 10 1 1 12 20 p : {idx,20}{addr,10}{wr,1}{rd,1} p b : {addr,30}{;,2} pte : {;,20}{idx,10} o : {;,20}{addr,10}{;,2} 30 2 pte b o Our approach: (2) Translation Expressions ! Records Bit-ops ! Field accesses mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr; o.addr=p.addr; return m[b.addr+o.addr];
Our approach Low-level operations eliminated bit-level types + translation mget(p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } if (p.rd == 0){ pte.idx = p.idx; b.addr=tab[pte.idx].addr; o.addr=p.addr; return m[b.addr+o.addr]; Program can be understood, optimized, verified
Plan • Motivation • Approach • Bit-level types + Translation • Key: Bit-level type Inference • Experiences • Related work
Constraint-based Type Inference Alice’s age: a Bob’sage:b = 22 = 54 Algorithm: 0. Variables for unknowns 1. Generate constraints on vars 2. Solve constraints 2a = b– 10 b = 2006 - 1952 Remember these: If Alice doubles her age, she would still be 10 years younger than Bob, who was born in 1952. How old are Alice and Bob ?
Constraint-based Type Inference Algorithm: 0. Variables for unknown • bit-level types of all program expressions • Generate constraints on vars • Solve constraints
Plan • Motivation • Approach • Bit-level types + Translation • Key: Bit-level type Inference • Constraint Generation • Constraint Solving • Experiences • Related work
Constraint Generation Type variables for eachexpression: p p p&0x1 p&0x1 pte pte mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
Generating Zero Constraints Mask: p&0xFFC[31:12] = ; p&0xFFC[1:0] = ; 020 02 31 12 1 0 mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
012 Generating Zero Constraints Shift: e>>12[31:20]= ; e is p&0xFFFFF000 31 20 mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
Inequality constraint x ¸e Why are zeros special ? x = e Consider assignment (value flowse to x) Should x and e have same bit-level type? K + x · K e Common idiom: k-bit values special case of k+-bit values • Equality results in unnecessary breaks • Zeros enable precise subtyping subtypes(·)
Generating Inequality Constraints Mask: p&0xFFC[11:2]¸p[11:2] 020 02 11 2 mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
012 Generating Inequality Constraints e Shift: e>>12[19:0] ¸ e[31:12] 12 mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; } 31 e>>12 19 0
Generating Inequality Constraints Assignment: o¸ p&0xFFC that is… o[31:0]¸p&0xFFC[31:0] mget (p) { if (p & 0x1 == 0){ error(“permission”); } pte =(p&0xFFFFF000)>>12; b = tab[pte]&0xFFFFFFFC; o = p&0xFFC; return m[(b+o)>>2]; }
Plan • Motivation • Approach • Bit-level types + Translation • Key: Bit-level type Inference • Constraint Generation • Constraint Solving • Experiences • Related work
20 10 1 1 A(p)= {idx,20}{addr,10}{wr,1}{rd,1} Constraint Solutions Solution is an assignment • A: type variables ! bit-level types A()[i:j] = subsequence of A() from bit i through j 31 12 5 1 2 • A(p)[12:1] = {addr,10}{wr,1} • A(p)[31:2] = {idx,20}{addr,10} • A(p)[31:5] = undefined
Constraint Solving Overview Solution is an assignment • A: type variables ! bit-level types A([i:j]) = subsequence from bit i through j A satisfies: • zero Constraint : [i:j] = ; • If A()[i:j] = ;i-j+1 • inequality Constraint: [i:j] ·’[i’:j’] • If A()[i:j] · A(’)[i’:j’] • In both cases, A()[i:j]must be defined
Constraint Solving Algorithm Input: Zero constraints {z_1,…,z_m} Inequality constraints {c1,…,cn} Output: Assignment satisfying all constraints A0 = Initial asgn satisfying zero constraints (details in paper) A = A0 foriin[1…n]: A = refine(A,ci) return A • refine(A,ci) adjusts A such that: • ci becomes satisfied • earlier constraints stay satisfied • built using Split, Unify
12 12 e, f,12 Refine: Split(A,,k) Throughout A, substitute: p,12 + A() p,32 A’ = Split(A,,12) and substitute: p,12- A’() e,20 f,12 f,12- where e , f are fresh
2 11+1 Refine: Split(A,,k) • Used toensure A()[i:j] is defined Ensure A()[11:2] is defined A() p,32 A’ = Split(A,,12) 11 A’() e,20 f,12 A’’ = Split(A’,,2) 11 2 A’’() e,20 g,10 h,2 A’’()[11:2] defined
Refine: Unify(A,p,q) Throughout A, substitute: p, q,
0 19 31 12 A’(’) s : 12 t : 20 A’() ;:10 q :10 r : 12 0 19 31 12 A’’(’) t : 32 t : 32 A’’() ;:10 t :10 r : 12 A’’ satisfies constraint Refine(A, [31:12] ·’[19:0]) 0 19 A(’)[19:0] undefined 31 12 A(’) p : 32 A() ;:10 q :10 r :12 A’ = Split(A,’,19+1) A’(’)[19:0] · A’()[31:12] A’’ = Unify(A’,q,t)
Constraint Solving Input: Constraints Output: Assignment satisfying all constraints A = A0 foriin[1…n]: A = refine(A,ci) return A Substitution (in Split, Unify) • ensures earlier constraints stay satisfied • most general solution found • Efficiently implemented using graphs
Plan • Motivation • Approach • Bit-level types + Translation • Key: Bit-level type Inference • Constraint Generation • Constraint Solving • Experiences • Related work
Experiences Implemented bit-level type inference for C • pmap: a kernel virtual memory system • Implements the code for our running example • mondrian: a memory protection system • scull: a linux device driver (1-3 Kloc) • Inference/Translation takes less than 1s
Mondrian [Witchel et. al.] • Bit packing for memory and permission bits • 2600 lines of code, generated 775 constraints • Translated to program without bit-operations • 18 different bit-packed structures • 10 assertions provided by programmer • After translation, assertions verified using BLAST • 6 safe: all require bit-level reasoning • Previously, verification was not possible • 4 false positives: imprecise modeling of arrays
Cop outs (i.e. Future Work) • Truly binary bit-vector operations • x << y, x && y • Currently: Value-flow analysis to infer constants flowing to y Break into a switch statement • Flow-sensitivity • Currently: SSA renaming • Arithmetic overflow • does a k-bit value “spill over” • Currently: Assume no overflow • Path-sensitivity (value dependent types) • Type of suffix depends on value of first field • e.g. Instruction decoder for architecture simulator • Number/type of operands depends on opcode
Plan • Motivation • Approach • Bit-level types + Translation • Key: Bit-level type Inference • Constraint Generation • Constraint Solving • Experiences • Related work
Related Work • O Callahan – Jackson [ICSE 97] • Type Inference • Gupta et. al. [POPL 03, CC02] • Dataflow analyses for packing bit-sections • Ramalingam et. al. [POPL 99] • Aggregate structure inference for COBOL
Conclusions • (Automatic) reasoning about Bit-operations hard • Structure: bit-operations pack data into one word • Structure Inferred via Bit-level Type Inference • Structure Exploited via Translation to fields • Precise, efficient reasoning about Bit-operations