540 likes | 560 Vues
This project aims to modernize formal verification engines by developing methodologies, algorithms, and software implementations to improve SAT solving, hybrid simulation, counter-example handling, and invariant generation.
E N D
Modernizing Formal Verification Engines Robert Brayton Niklas Een Alan Mishchenko Berkeley Verification and Synthesis Research Center Department of EECS UC Berkeley
Task Overview • SRC task ID: 2265.001 • Start date: April 1, 2012 • Thrust area: Verification • Task leaders: • Robert Brayton, Nilkas Een, Alan Mishchenko (Univ. of California/Berkeley) • Industrial liaisons: • See next slide • Students: • Jiang Long, Sayak Ray, Baruch Sterin
Industrial Liaisons • Freescale • Himyanshu Anand • IBM • Jason Baumgartner • Intel • Timothy Kam, Ranan Fraer, Alexander Nadel, Murali Talupur • Mentor Graphics • Jeremy Levitt, Christian Stangier (Source: http://www.src.org/library/research-catalog/2265.001/)
Anticipated Results • Methodology and algorithms for next-generation improvements in formal verification, addressing • SAT solving • hybrid simulation • counter-example handling • invariant generation • Public software implementation of the above methodology and algorithms. • Experimental evaluation on industrial benchmarks.
Task Description • We propose to leverage the unique expertise of our group, Berkeley Verification and Synthesis Research Center (BVSRC), and our previous SRC contracts for solving hard industrial problems arising in formal verification. The goal would be a new level in the state-of-the-art of logic formal verification engines, which adds the following to the design flow: • Application-specific SAT solvers to improve performance of key verification engines. Two new design decisions will be explored for developing SAT solvers, which are specifically geared to solving numerous, related, and relatively easy problems, on the one hand, and monolithic, large, and hard problems, on the other hand. • Hybrid simulation based on new heuristics to improve state space coverage. New ideas will be explored for improving bit-level simulation and combining it with symbolic simulation, handled by adding symbolic variables or exhaustively simulating selected input subspaces. • Counter-example minimization to shorten the counter-examples produced by some algorithms, such as random and hybrid simulation. A counter-example minimizer will be developed based on hybrid simulation and bounded model checking. Another aspect to be explored is the use of concurrency to speed up the minimization process. • Various methods for automated inductive invariant generation. Several ways of generating inductive invariants will be explored. One of them is based on using high-level information about the design; another is an extension of a previous method based on the structural analysis of the AIG. • The new methods developed while working on this proposal will be tested on industrial designs in our synthesis and verification tool, ABC, and made available in source code, which can be customized to specific applications.
Task Deliverables 2013 • Annual review presentation (27-Mar-2013) • Report on a software release of a circuit-based SAT solver. Evaluation on industrial problems (30-Jun-2013) 2014 • Annual review presentation (30-Apr-2014) • Report on a software release of a counter-example minimizer. Evaluation on industrial problems (30-Jun-2014) 2015 • Report on a software release of a hybrid simulator and invariant generator. Evaluation on industrial problems (30-Apr-2015) • Final report summarizing research accomplishments and future direction (30-Jun-2015)
Current State of the Project • Covered in this presentation • Semi-canonical form for sequential circuits (DATE’13) (http://www.eecs.berkeley.edu/~alanmi/publications/2013/date13_iso.pdf) • Automated gate-level abstraction (DATE’13) (http://www.eecs.berkeley.edu/~alanmi/publications/2013/date13_gla.pdf) • Counter-example analysis (IWLS’13) (http://www.eecs.berkeley.edu/~alanmi/publications/2013/iwls13_cex.pdf) • Advances in simulation (IWLS’12) (http://www.eecs.berkeley.edu/~alanmi/publications/2012/iwls12_sec.pdf) • Other developments / work in progress • Advances in application-specific SAT solving • Towards improved invariant generation • Solving multiple-output properties
Motivation for Canonicizing Circuits • Logic circuits often contain duplicate sub-circuits expressed in terms of different primary inputs • This leads to redundant work • Synthesis tools repeatedly analyze the same sub-circuits • Verification tools repeatedly solve the same instances 9
Proposed Solution • Key idea: identify duplicate sub-circuits • Ideal solution: exact graph isomorphism • May be expensive and hard to implement • Our solution is heuristic • Find a semi-canonical circuit structure • Computation is similar to simulation • Efficient and straight-forward to implement • Uses only structural information • No need for signal names and user hints 10
Example • Consider unique attributes of each node 11
Circuit Data-Structure • In this work, sequential circuits are represented as And-Inverter Graphs (AIGs) • AIG is a Boolean network whose logic nodes are two-input AND-nodes and inverters • Inverters are represented as complemented attributes • AIG is a uniform and low-memory data-structure • It allows for an efficient implementation of a variety of algorithms working on sequential circuits
Algorithm Overview • Structural signature of a node is an integer number computed for the node using its location in the circuit • Initially, signatures of all nodes are set to 0 • Circuit is repeatedly traversed and signatures are updated • Goal: assign unique signatures for as many nodes as possible • Motivation for computing unique structural signatures • If a node has a unique signature, it has been uniquely identified using its position in the circuit structure • A one-to-one mapping between the nodes of two circuits can be found using unique signatures of their nodes • If such mapping exists, the circuits are structurally isomorphic 13
Algorithm Overview • Signature propagation is similar to circuit simulation • During circuit simulation, values of the nodes are computed in a direct topological order • During signature propagation, signatures of the nodes are computed in a direct (or reverse) topological order • Edge value reflects the structure around an edge • Depends on the position (logic level) of the driving node • Depends on whether the edge is complemented or not • Each time a node is traversed, edge values of its fanins (or fanouts) are added to the signature of the node 14
Example 0 0 2 3 1 15 0 8 19 0 5 7 6 4 0 0 0 0 42 71 12 12 12 12 15
Implementation • Computation of unique signatures is implementation in ABC • The unique signatures (which are integer numbers) are used to put nodes of a circuit in a semi-canonical order • When nodes are written into a file in this order, the resulting file is a semi-canonical form of the circuit • If files for two circuits are identical, circuits are isomorphic • Application 1: ABC command “write_aiger –u” • Writes the netlist in a semi-canonical form • Application 2: ABC command “&iso” • Discards isomorphic POs 16
Outputting Semi-Canonical Form Netlist N1 write_aiger -u N1.aig diff Netlist N2 write_aiger -u N2.aig • Application 1: ABC command “write_aiger –u” • Writes the netlist in semi-canonical form • Useful for quickly comparing netlists 17
Removing Isomorphic POs G’ F’ a' b’ c’ d’ • Counterexamples/invariants on F/G can be re-mapped to F’/G’ • Application 2: ABC command “&iso” • Derive a semi-canonical form for each PO • Discard POs that have duplicate semi-canonical forms • i.e. “drop isomorphic proof obligations” 18
Deriving Sequential Miter for a Verification Problem Problem formulation Sequential miter Property output Property output Property monitor • Given are: • Hardware design • Property to be checked Combinational logic gates and flip-flops with initial state Hardware design
Motivation for Abstraction Verification problems can be large (1M-10M gates) Often proof can be completed without looking at the whole instance (~1% of logic is often enough) Localization abstraction decides what part of the instance to look at Our work is motivated by the need to increase scalability of abstraction beyond what is currently available Gates included in the abstraction Gates excluded from the abstraction Location abstraction Sequential miter
Classification of Abstraction Methods • Automatic vs. manual • SAT-based vs. BDD-based vs. other • Proof-based vs. CEX-based vs. hybrid • Flop-level vs. gate-level • The proposed approach is: • Automatic (derived automatically by the tool) • SAT-based (built on top of efficient BMC engine) • Hybrid (uses both counter-examples and proofs) • Gate-level(uses individual gates as building blocks)
What is BMC? • BMC stands for Bounded Model Checking • BMC checks the property in the initial state and the following clock cycles (time frames) • In practice, BMC incrementally unfolds the sequential circuit and runs a SAT solver on each time frame of the unrolled combinational circuit • If a bug is detected, BMC stops • This goes on while resource limits allow Frame 3 Frame 2 primary output flop inputs Frame 1 combinational logic Frame 0 Unfolding of sequential circuit primary inputs flop outputs Sequential circuit
Why BMC Works Well? • BMC engine adds the complete “tent” (bounded cone-of-influence) in each frame • This quickly leads to large SAT instances • However, BMC has been successfully applied to designs with millions of nodes for hundreds/thousands of time frames • The reason is: • In efficient implementations of BMC, constants are propagated and structural hashing is performed for the logic across the time frame boundaries Frame 3 Frame 2 Frame 1 Frame 0
How Abstraction is Implemented? • Hybrid abstraction (Een et al, FMCAD’10) combines counter-example-based abstraction and proof-based abstraction in one engine (using one SAT solver) • The hybrid abstraction engine is an extension of the BMC engine • Counter-examples are used to grow abstraction • Proofs are used to prune irrelevant logic Frame 3 Frame 2 Frame 1 Frame 0 Unfolding of the abstracted model
Why Traditional Abstraction is Less Scalable Than BMC? • The key to BMC’s scalability is constant propagation and structural hashing • However, in the abstraction engine, these are not allowed because the complete resolution proof of each UNSAT call is needed to perform a proof-based abstraction • Our contributions: (1) Bypass the need for complete proof, resulting in increased scalability (2) Compute incremental UNSAT cores, resulting in drastic memory savings Frame 3 Frame 2 Frame 1 Frame 0 Unfolding of the abstracted model
How This is Achieved? New gates added during refinement in this time frame Old gates added in the previous time frames • (1) Bypass the need for complete proof, resulting in increased scalability • Simplify old gates added in the previous time frames • Perform proof-logging in terms of new gates added during refinement in the current time frame • (2) Compute incremental UNSAT cores, resulting in drastic memory savings • Use bit-strings to represent simplified proof recorded for each learned clause • Perform reduced proof-logging using bit-wise operations Frame 3 Frame 2 Frame 1 Frame 0 Unfolding of the abstracted model
Why Memory Is Saved? New gates added during refinement in this time frame Old gates added in the previous time frames • Assume proof logging is performed • 25 literals in each learned clause • 100 bytes for the clause • 100 antecedents in each learned clause • 400 bytes for the proof • 1M learned clauses with antecedents • 500 MB for complete proof • Assume simplified proof-logging is performed • 200 different proof IDs used • 25 bytes per clause • 100K learned clauses in the incremental proof • 2.5 MB for incremental proof • Memory reduction: 200x Frame 3 Frame 2 Frame 1 Frame 0 Unfolding of the abstracted model
Components of Abstraction Engine • BMC engine • AIG package • SAT solver • CNF computation • Technology mapper • UNSAT core computation • Proof logger • Refinement engine • Counter-example simulation • Circuit analysis • (UNSAT core computation)
A Typical Run of GLA abc 02> &r klm.aig; &ps; &gla -vf -F 90 -R 0 klm : i/o = 155/ 1 ff = 3795 and = 20098 lev = 36 Running gate-level abstraction (GLA) with the following parameters: FrameMax = 90 ConfMax = 0 Timeout = 0 RatioMin = 0 % RatioMax = 30 % LrnStart = 1000 LrnDelta = 200 LrnRatio = 70 % Skip = 0 SimpleCNF = 0 Dump = 0 Frame % Abs PPI FF LUT Confl Cex Vars Clas Lrns Time Mem 0 : 0 13 18 2 10 3 12 30 51 0 0.01 sec 0 MB 1 : 0 15 19 3 11 4 2 67 114 0 0.01 sec 0 MB 2 : 0 21 20 6 14 8 7 121 217 3 0.01 sec 0 MB 3 : 0 23 19 7 15 12 2 167 314 7 0.01 sec 0 MB 5 : 0 29 25 8 20 93 18 324 611 15 0.01 sec 0 MB 9 : 0 35 25 10 24 36 6 600 1.20k 42 0.01 sec 0 MB 13 : 0 42 25 12 29 87 9 938 1.98k 65 0.02 sec 0 MB 17 : 1 134 42 40 93 1838 62 3.17k 7.97k 120 0.16 sec 2 MB 21 : 1 135 41 40 94 54 1 1.17k 2.49k 84 0.17 sec 0 MB 29 : 1 178 48 56 121 3396 42 3.23k 7.81k 222 0.59 sec 2 MB 33 : 1 184 49 58 125 1267 22 2.08k 4.41k 117 0.78 sec 1 MB 37 : 1 190 54 60 129 2421 31 2.84k 5.87k 157 1.12 sec 2 MB 41 : 1 191 53 60 130 42 1 3.22k 6.87k 214 1.12 sec 2 MB 45 : 2 295 61 103 191 10539 86 8.81k 20.8k 287 6.60 sec 13 MB 49 : 3 402 89 140 261 5300 45 11.0k 25.7k 289 8.12 sec 5 MB 53 : 3 458 100 158 299 4227 38 10.1k 24.5k 431 10.12 sec 6 MB 57 : 4 522 121 175 346 6275 39 16.1k 38.5k 1.03k 14.63 sec 7 MB 61 : 5 656 140 223 432 12139 53 18.6k 45.8k 2.17k 28.69 sec 15 MB 65 : 6 749 159 250 498 10482 42 27.7k 68.9k 2.90k 44.88 sec 16 MB 69 : 6 786 156 264 521 4245 16 15.5k 43.1k 2.98k 49.16 sec 10 MB 73 : 7 818 155 276 541 4915 9 19.6k 54.1k 3.41k 51.21 sec 10 MB 89 : 7 818 155 276 541 38979 - 26.3k 74.0k 11.4k 64.22 sec 12 MB SAT solver completed 90 frames and produced a 16-stable abstraction. Time = 64.22 sec abc 02> &ps; &gla_derive; &put; pdr klm : i/o = 156/ 1 ff = 276 and = 1345 lev = 18 Property proved. Time = 166.65 sec
Experimental Setting • Comparing 4 abstraction engines • ABS (flop-based hybrid abstraction - N. Een et al, FMCAD 2010) • GLA without simplification and with full proof-logging (&gla –np) • GLA without simplification and with incremental proofs (&gla –n) • GLA with simplification and with incremental proofs (&gla) • Using the suite of IBM benchmarks from the 2011 Hardware Model Checking Competition • Benchmarks 6s40p1 and 6s40p2 are removed as easily SAT • Running one core of Intel Xeon CPU E5-26702.60GHz • Using a 5 min timeout for each benchmark • Learned clause removal, abstraction manager restarts and early termination, are disabled • The command line is: &gla [-n] [-p] –L 0 –P 0 –R 0 –T 300 32
Observations • In five minutes, GLA finds abstractions that are tested 59% (10%) deeper than those found by ABS (GLAn). • GLA produces abstractions that are close to ABS in terms of flops but 36% smaller in terms of AND gates. • GLA uses on average 500x less memory for UNSAT cores than GLAnp, which computes a complete proof.
Future Work • Enhancing abstraction refinement by performing a more detailed structural analysis • Improving scalability of refinement for deep failures using partial counter-examples • Applying similar approach to make interpolation-based model checking more scalable
Key Idea • A counter-example (CE) is a set of PI values in each time frame, which leads to the property failure • Given a CE, PI values can be divided into three categories • Essential PIs whose values are needed for the property failure • Don’t-care PIs whose values are not important • Optional PIs (all the remaining ones) • We introduce the notion of CE-induced network • This network, composed of two-input AND-/OR-gates, has unate Boolean function in terms of PI variables, which represents all subsets of the PIs implying the property failure according to the CE • Applications • Design debugging, abstraction refinement , CE depth minimization A. Mishchenko, N. Een, and R. Brayton, "A toolbox for counter-example analysis and optimization", To appear in IWLS'13.
1 0 0 1 0 0 1 0 1 Construction of CE-Induced Network CE-induced network Unfolding • Unfold the original network for the depth indicated by the CE • Assign values of primary inputs and internal nodes according to the CE • Replace all primary inputs of the unfolding by free variables • Replace each AND of the unfolding by AND, OR or BUF using the rules • Rehash and sweep dangling nodes
Experiment: CE Bit Profiling Engine: Formal verification engine that produced counter-example Total bits: The total number of primary inputs in the unrolled testcase DC/Opt/Essen: Percentage of don’t-care, optional, and essential bits Min: Percentage of bits in the minimized counter-example Time: Runtime of bit profiling in seconds
Experiment: Bounded Unfolding vs. CE-Induced Network CE Depth: The timeframe where the property fails according to the CE PI/AND/Level: The number of PIs, AIG nodes, and AIG node levels Time: Runtime of unfolding vs. constructing CE-induced network, in seconds
Key Idea • Rarity simulation is guided random simulation with prioritizing reachable states • Gracefully handles resets by skipping frequently visited states • Visits rare reachable states where hard-to-detect bugs may be found • Typically more efficient than naïve random simulation
Rarity Simulation: Implementation • Divide flops into fixed-size groups in the order of their appearance in the design • Groups of 8 flops are used by default • Maintain a record of observed flop values • For each group, 256 (=2^8) counters are used • After simulating a fixed number (by default, 20) frames, recompute the frequency of having a given value in each flop group, and choose next states for simulation based on the rarity of values • By default, 1 out of 64 states is chosen R. Brayton, N. Een, and A. Mishchenko, "Using speculation for sequential equivalence checking", Proc. IWLS'12, pp. 139-145.
Rarity Simulation: Illustration Random PI values Initial state Start with initial state Accumulate info about reached states Decide what next states to simulate from Accumulate info about reached states Decide what next states to simulate from etc
Experiments • Comparing • rarity-based random simulation • naïve random simulation • Using rarity simulation in these applications: • Finding counter-examples for hard safety properties • Solving many relatively easy SAT problems • Computing candidate equivalence classes of nodes • Quality / runtime improvements range from 0% to 30%
Other Research Directions • Advances in application-specific SAT solving • Towards improved invariant generation • Solving multiple-output properties
Advances in SAT Solving • PunySAT: An application-specific SAT solver • Geared to small, hard SAT instances • A cross between MiniSAT and Espresso • Similar to MiniSAT in everything, except clauses are not integer arrays, but bit-strings with literals in positional notation (similar to how cubes are represented in Espresso) • BCP can be more efficient for small problems • Experimental results are currently inconclusive • Tied with MiniSAT on 50 problems from SAT competition
Rediscovery of High-Level Structure in a Bit-Level AIG via Support Hashing • Algorithm • Input: Sequential AIG • Output: Sequential AIG annotated with high-level information • Computation • Select a subset of inputs (or internal nodes) with high fanout • Iterate combinational support computation to derive sequential support of every node in the AIG in terms of the selected nodes • Hash nodes by their support to find their equivalence classes • Group equivalence classes of small cardinality and/or with similar support to create well-balanced partitions • (optional) Iterate support hashing to break large partitions into smaller ones • Applications • Circuits partitioning • Invariant computation
Support Hashing: Illustration • Goal: Compute structural partitions • Step 1: Select a subset of nodes with high fanout • Step 2: Compute structural support in terms of selected nodes • Step 3: Hash nodes by their support Node X Node Y