Modeling Data in Formal Verification Bits, Bit Vectors, or Words

# Modeling Data in Formal Verification Bits, Bit Vectors, or Words

Télécharger la présentation

## Modeling Data in Formal Verification Bits, Bit Vectors, or Words

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Modeling Data in Formal VerificationBits, Bit Vectors, or Words Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant

2. Overview • Issue • How should data be modeled in logical analysis? • Verification, synthesis, test generation, security analysis, … • Approaches • Bits: Every bit is represented individually • Words: View each word as arbitrary value • E.g., unbounded integers • Historic program verification work • Bit Vectors: Finite precision words • Captures true semantics of hardware and software

3. Data Path Com. Log. 1 Com.Log. 2 Bit-Level Modeling Control Logic • Represent Every Bit of State Individually • Behavior expressed as Boolean next-state over current state • Historic method for most CAD, testing, and verification tools • E.g., model checkers, test generators

4. Bit-Level Modeling in Practice • Strengths • Allows precise modeling of system • Well developed technology • BDDs & SAT for Boolean reasoning • Limitations • Every state bit introduces two Boolean variables • Current state & next state • Overly detailed modeling of system functions • Don’t want to capture full details of FPU • Making It Work • Use extensive abstraction to reduce bit count • Hard to abstract functionality

5. x Word-Level Abstraction #1: Bits → Integers x0 • View Data as Symbolic Words • Arbitrary integers • No assumptions about size or encoding • Classic model for reasoning about software • Can store in memories & registers x1 x2 xn-1

6. Data Path Data Path Com. Log. 1 Com. Log. 1 ? Com.Log. 2 Com. Log. 1 ? What do we do about logic functions? Abstracting Data Bits Control Logic

7. ALU Word-Level Abstraction #2: Uninterpreted Functions • For any Block that Transforms or Evaluates Data: • Replace with generic, unspecified function • Only assumed property is functional consistency: a = x b = y f(a, b) = f(x, y) f

8. F1 F2 Abstracting Functions Control Logic • For Any Block that Transforms Data: • Replace by uninterpreted function • Ignore detailed functionality • Conservative approximation of actual system Data Path Com. Log. 1 Com. Log. 1

9. Word-Level Modeling: History • Historic • Used by theorem provers • More Recently • Burch & Dill, CAV ’94 • Verify that pipelined processor has same behavior as unpipelined reference model • Use word-level abstractions of data paths and memories • Use decision procedure to determine equivalence • Bryant, Lahiri, Seshia, CAV ’02 • UCLID verifier • Tool for describing & verifying systems at word level

10. Pipeline Verification Example Pipelined Processor Reference Model

11. Abstracted Pipeline Verification Pipelined Processor Reference Model

12. Experience with Word-Level Modeling • Powerful Abstraction Tool • Allows focus on control of large-scale system • Can model systems with very large memories • Hard to Generate Abstract Model • Hand-generated: how to validate? • Automatic abstraction: limited success • Andraus & Sakallah, DAC 2004 • Realistic Features Break Abstraction • E.g., Set ALU function to A+0 to pass operand to output • Desire • Should be able to mix detailed bit-level representation with abstracted word-level representation

13. Bit Vectors: Motivating Example #1 • Do these functions produce identical results? • Strategy • Represent and reason about bit-level program behavior • Specific to machine word size, integer representations, and operations int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1; } int test_abs(int x) { return (x < 0) ? -x : x; }

14. Motivating Example #3 bit[W] popSpec(bit[W] x) { int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt; } bit[W] popSketch(bit[W] x) { loop (??) { x = (x&??) + ((x>>??)&??); } return x; } • Is there a way to expand the program sketch to make it match the spec? • Answer • W=16: • [Solar-Lezama, et al., ASPLOS ‘06] x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);

15. Motivating Example #4 Sequential Reference Model Pipelined Microprocessor • Is pipelined microprocessor identical to sequential reference model? • Strategy • Represent machine instructions, data, and state as bit vectors • Compatible with hardware description language representation • Verifier finds abstractions automatically

16. Bit Vector Formulas • Fixed width data words • Arithmetic operations • Add/subtract/multiply/divide, … • Two’s complement, unsigned, … • Bit-wise logical operations • Bitwise and/or/xor, shift/extract, concatenate • Predicates • ==, <= • Task • Is formula satisfiable? • E.g., a > 0 && a*a < 0 50000 * 50000 = -1794967296 (on 32-bit machine)

17. Decision Procedure Satisfying solution Formula Unsatisfiable (+ proof) Decision Procedures • Core technology for formal reasoning • Boolean SAT • Pure Boolean formula • SAT Modulo Theories (SMT) • Support additional logic fragments • Example theories • Linear arithmetic over reals or integers • Functions with equality • Bit vectors • Combinations of theories

18. Recent Progress in SAT Solving

19. BV Decision Procedures:Some History • B.C. (Before Chaff) • String operations (concatenate, field extraction) • Linear arithmetic with bounds checking • Modular arithmetic • Limitations • Cannot handle full range of bit-vector operations

20. BV Decision Procedures:Using SAT • SAT-Based “Bit Blasting” • Generate Boolean circuit based on bit-level behavior of operations • Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker • Handles arbitrary operations • Effective in Many Applications • CBMC [Clarke, Kroening, Lerda, TACAS ’04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] • CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]

21. Bit-Vector Challenge • Is there a better way than bit blasting? • Requirements • Provide same functionality as with bit blasting • Find abstractions based on word-level structure • Improve on performance of bit blasting • Observation • Must have bit blasting at core • Only approach that covers full functionality • Want to exploit special cases • Formula satisfied by small values • Simple algebraic properties imply unsatisfiability • Small unsatisfiable core • Solvable by modular arithmetic • …

22. Some Recent Ideas • Iterative Approximation • UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07 • Use bit blasting as core technique • Apply to simplified versions of formula • Successive approximations until solve or show unsatisfiable • Using Modular Arithmetic • STP: Ganesh & Dill, CAV ’07 • Algebraic techniques to solve special case forms • Layered Approach • MathSat: Bruttomesso, Cimatti, Franzen, Griggio, Hanna, Nadel, Palti, Sebastiani, CAV ’07 • Use successively more detailed solvers

23. + Overapproximation More solutions: If unsatisfiable, then so is    + Fewer solutions: Satisfying solution also satisfies  −   Underapproximation − Iterative Approach Background: Approximating Formula • Example Approximation Techniques • Underapproximating • Restrict word-level variables to smaller ranges of values • Overapproximating • Replace subformula with Boolean variable  Original Formula

24. Starting Iterations • Initial Underapproximation • (Greatly) restrict ranges of word-level variables • Intuition: Satisfiable formula often has small-domain solution  1−

25. 1+ UNSAT proof: generate overapproximation If SAT, then done First Half of Iteration • SAT Result for 1− • Satisfiable • Then have found solution for  • Unsatisfiable • Use UNSAT proof to generate overapproximation 1+ • (Described later)  1−

26. If UNSAT, then done SAT: Use solution to generate refined underapproximation 2− Second Half of Iteration • SAT Result for 1+ • Unsatisfiable • Then have shown  unsatisfiable • Satisfiable • Solution indicates variable ranges that must be expanded • Generate refined underapproximation 1+  1−

27. Iterative Behavior 2+ 1+ • Underapproximations • Successively more precise abstractions of  • Allow wider variable ranges • Overapproximations • No predictable relation • UNSAT proof not unique    k+  k−    2− 1−

28. 2+ 1+    UNSAT k+  k−    2− 1− SAT Overall Effect • Soundness • Only terminate with solution on underapproximation • Only terminate as UNSAT on overapproximation • Completeness • Successive underapproximations approach  • Finite variable ranges guarantee termination • In worst case, get k−  

29. 1+ UNSAT proof: generate overapproximation  1− 2− Generating Overapproximation • Given • Underapproximation 1− • Bit-blasted translation of 1− into Boolean formula • Proof that Boolean formula unsatisfiable • Generate • Overapproximation 1+ • If 1+ satisfiable, must lead to refined underapproximation • Generate 2− such that 1−  2−  

30. x = y x + 2 z 1 a w & 0xFFFF = x x % 26 = v Bit-Vector Formula Structure • DAG representation to allow shared subformulas 

31. x = y x + 2 z 1 w Range Constraints x y a z w & 0xFFFF = x x % 26 = v Æ Structure of Underapproximation • Linear complexity translation to CNF • Each word-level variable encoded as set of Boolean variables • Additional Boolean variables represent subformula values −

32. w Range Constraints x y z Ç Æ : Æ Ç UNSAT Proof • Subset of clauses that is unsatisfiable • Clause variables define portion of DAG • Subgraph that cannot be satisfied with given range constraints x = y x + 2 z 1 a Æ w & 0xFFFF = x Ç x % 26 = v

33. w Range Constraints x y z Ç Æ : Æ Ç Extracting Circuit from UNSAT Proof • Subgraph that cannot be satisfied with given range constraints • Even when replace rest of graph with unconstrained variables x = y x + 2 z 1 UNSAT a Æ b1 b2

34. Ç Æ : Ç Generated Overapproximation • Remove range constraints on word-level variables • Creates overapproximation • Ignores correlations between values of subformulas x = y x + 2 z 1 a 1+ Æ b1 b2

35. w Range Constraints x y z Ç Æ : Æ Ç Refinement Property • Claim • 1+ has no solutions that satisfy 1−’s range constraints • Because 1+ contains portion of 1− that was shown to be unsatisfiable under range constraints x = y x + 2 z 1 UNSAT a Æ 1+ b1 b2

36. Ç Æ : Ç Refinement Property (Cont.) • Consequence • Solving 1+ will expand range of some variables • Leading to more exact underapproximation 2− x = y x + 2 z 1 a 1+ Æ b1 b2

37. SAT: Use solution to generate refined underapproximation 2− Effect of Iteration • Each Complete Iteration • Expands ranges of some word-level variables • Creates refined underapproximation 1+ UNSAT proof: generate overapproximation  1−

38. Approximation Methods • So Far • Range constraints • Underapproximate by constraining values of word-level variables • Subformula elimination • Overapproximate by assuming subformula value arbitrary • General Requirements • Systematic under- and over-approximations • Way to connect from one to another • Goal: Devise Additional Approximation Strategies

39. x * y Function Approximation Example • Motivation • Multiplication (and division) are difficult cases for SAT • §: Prohibit Via Additional Range Constraints • Gives underapproximation • Restricts values of (possibly intermediate) terms • §: Abstract as f(x,y) • Overapproximate as uninterpreted function f • Value constrained only by functional consistency

40. Challenges with Iterative Approximation • Formulating Overall Strategy • Which abstractions to apply, when and where • How quickly to relax constraints in iterations • Which variables to expand and by how much? • Too conservative: Each call to SAT solver incurs cost • Too lenient: Devolves to complete bit blasting. • Predicting SAT Solver Performance • Hard to predict time required by call to SAT solver • Will particular abstraction simplify or complicate SAT? • Combination Especially Difficult • Multiple iterations with unpredictable inner loop

41. STP: Linear Equation Solving • Ganesh & Dill, CAV ’07 • Solve linear equations over integers mod 2w • Capture range of solutions with Boolean Variables • Example Problem • Variables: 3-bit unsigned integers x: [x2 x1 x0] y: [y2 y1 y0] z: [z2 z1 z0] • Linear equations: conjunction of linear constraints • General Form • A x + b = 0 mod 2w

42. Summary: Modeling Levels • Bits • Limited ability to scale • Hard to apply functional abstractions • Words • Allows abstracting data while precisely representing control • Overlooks finite word-size effects • Bit Vectors • Realistic semantic model for hardware & software • Captures all details of actual operation • Detects errors related to overflow and other artifacts of finite representation • Can apply abstractions found at word-level

43. Areas of Agreement • SAT-Based Framework Is Only Logical Choice • SAT solvers are good & getting better • Want to Automatically Exploit Abstractions • Function structure • Arithmetic properties • E.g., associativity, commutativity • Arithmetic reductions • E.g., LU decomposition • Base Level Should Be SAT • Only semantically complete approach

44. Choices • Optimize for Special Formula Classes • E.g., STP optimized for conjunctions of constraints • Common in software verification & testing • Iterative Abstraction • Natural framework for attempting different abstractions • Having SAT solver in inner loop makes performance tuning difficult • Others?

45. Observations • Bit-Vector Modeling Gaining in Popularity • Recognition of importance • Benchmarks and competitions • Just Now Improving on Bit Blasting + SAT • Lots More Work to be Done