Shuvendu K. Lahiri Sanjit A. Seshia Randal E. Bryant Carnegie Mellon University, USA

Modeling and Verification of Out-of-Order Microprocessors in UCLID Shuvendu K. Lahiri Sanjit A. Seshia Randal E. Bryant Carnegie Mellon University, USA

Instruction Set Architecture Transition: One instruction execution Microarchitecture Transition: One clock cycle Processor Verification • Views of System Operation • Instruction Set • Instructions executed in sequential order • Instruction modifies “programmer-visible” state • Microarchitecture • At any given time, multiple instructions “in flight” • State held in hidden pipeline registers and buffers • Verification Task • Prove all instruction sequences execute as predicted by instruction set model

Introduction and Related Work • Inorder Pipeline Verification • Burch and Dill, CAV ’94 • Relates implementation and specification by completing partially-executed instructions in the pipeline (flushing) • Infinite data words, memories • Bounded (fixed) resources only • Can’t model a reorder buffer (ROB) of arbitrary length • Out-of-Order Processor Verification • Arbitrary large (64-128) reorder buffer, reservation stations and load-store queues • Very large number of instruction in the pipeline • No finite flushing function to drain the pipeline

Out-Of-Order Processor Verification • Theorem Proving approaches • Hosabettu et al. (‘00), Sawada et al.(98), Arons et al.(‘00) • Write inductive invariants • Manually guide the theorem-provers for proving invariants • Large, complicated proof scripts (fragile) • Seldom have good counterexample facilities • Compositional Model Checking [McMillan et al.] • Use compositional model checking with temporal case splitting, path splitting, symmetry and data-type reduction • Does not need to write inductive invariants • User needs to manually decompose the proof • Has not been demonstrated effective for deep, superscalar pipelines • Other Approaches • Finite State Model Checking [Berezin et al.], Incremental Flushing [Skakkaebek et al.], Decision Procedure [Velev]

Contributions • Extends the work by Bryant & Velev • Restricted to Inorder pipelines with bounded resources • Application of UCLID • Modeling Framework for Out-Of-Order processors • Application of three verification approaches to Out-Of-Order Processor • Effective use of automated decision procedure • For proving large formulas automatically • Simple heuristics for quantifier instantiation

CLU : Logic of UCLID • Terms (T ) Integer Expressions • ITE(F, T1, T2)If-then-else Fun (T1, …, Tk) Function application succ (T) Increment pred (T) Decrement • Formulas (F ) Boolean Expressions F, F1F2, F1F2 Boolean connectives T1 = T2 Equation T1 < T2 Inequality P(T1, …, Tk) Predicate application • Functions (Fun) Integers  Integer f Uninterpreted function symbol  x1, …, xk . T Function definition • Predicates (P) Integers  Boolean p Uninterpreted predicate symbol l x1, …, xk . F Predicate definition

Decision Procedure CLU Formula • Operation • Series of transformations leading to propositional formula • Propositional formula checked with BDD or SAT tools • Bryant, Lahiri, Seshia [CAV02] Lambda Expansion -free Formula Function & Predicate Elimination Function-free Formula Convert to Boolean Formula Boolean Formula Boolean Satisfiability

Initially Arbitrary state Modeled by uninterpreted function m0 Writing Transforms Memory next[M] = Write(M, wa, wd) a . ITE(a = wa, wd, M(a)) Future reads of address wa will get wd M a next[M] wa = wd 1 0 a M M m0 a Modeling Memories with ’s • Memory M Modeled as Function • M(a): Value at location a

• • • • • • Modeling Unbounded FIFO Buffer • Queue is Subrange of Infinite Sequence • h : INT • Head of the queue • t : INT • Tail of the queue • q : INT INT • Function mapping indices to values • q(i) valid only when hi < t q(h–2) q(h–1) head q(h) q(h+1) • • • q(t–2) q(t–1) tail q(t) q(t+1)

op = PUSH Input = x • • • • • • • • • q(h–2) q(h–2) q(h–1) q(h–1) next[h] q(h) q(h) h q(h+1) q(h+1) next[t] := case (operation = PUSH) : succ(t) ; default : t; esac • • • • • • next[h] := case (operation = POP) : succ(h) ; default : h ; esac q(t–2) q(t–2) q(t–1) q(t–1) next[q] := lambda (i). case (operation = PUSH) & (i=t) : x; default : q(i) ; esac t q(t) x next[t] q(t+1) q(t+1) • • • Modeling FIFO Buffer (cont.)

Simultaneous-Update Memories Update arbitrary subset of entries at the same step next[M] :=i. ITE(P(i), D(i), M(i)) Any entry, i, which satisfies a predicate P(i) will get updated with D(i) Useful for modeling Reorder Buffers Forwarding data to all dependant instructions M(i) M(i) P(i+1) is true P(i+1) is true • • • • • • • • • • • • D(i+1) M(i+1) P(i+2) is true P(i+2) is true D(i+2) M(i+2) • • • • • • M(j) M(j) P(j+1) is true P(j+1) is true D(j+1) M(j+1) M(j+2) M(j+2) P(j+3) is true P(j+3) is true D(j+3) M(j+3) Modeling Parallel Updates

UCLID description Bounded Property Checking Correspondence Checking Inductive Invariant Checking Term-level Symbolic Simulator Decision Procedure Counter Example Generator SAT BDD • Systems are modeled in CLU logic • Three verification techniques • Based on Symbolic Simulation • Uses the decision procedure • Counter example traces generated for verification failures

Verification Techniques in UCLID • Bounded Property Checking • Start in reset state • Symbolically simulate for fixed number of steps • Verify a safety property for all states reachable within the fixed number of steps from the start state • Correspondence Checking • Run 2 different simulations starting in most general state • Prove that final states equivalent • e.g. Burch-Dill Technique • Invariant Checking • Start in general state s • ProveInv(s)  Inv(next[s]) • Limited support for automatic quantifier instantiation

An Out-of-order Processor (OOO) valid tag val incr D E C O D E dispatch • Out of order execution engine • Register Renaming • Inorder retirement • Unbounded Reorder buffer • Arithmetic instructions only • Model different components in UCLID Program memory valid value src1valid src1val src1tag src2valid src2val src2tag dest op result PC Register Rename Unit 1st Operand result bus retire 2nd Operand ALU execute Reorder Buffer head tail Reorder Buffer Fields

Verification of OOO : Automation vs. Guarantee • Presence of decision procedure • Efficiency : Allows improved bounded property checking and Burch-Dill method • Automation : Reduces manual guidance in proving invariants • Automatic Instantiation of quantifiers

Technique 1 : Bounded Property Checking • Debugging OOO using Bounded Property Checking • All the errors were discovered during this phase • Counterexample trace of great help • Debugging Motorola ELF™ • Superscalar out-of-order processor • Reorder Buffer, memory unit, load-store queues etc. • Applied during early design exploration phase

Bounded Property Checking Results • SVC (Stanford) : Another decision procedure to solve CLU formulas • Can decide more expressive class • CVC (Successor of SVC) runs out of memory on larger cases

Qspec Qspec kspec Abs Abs Qimpl Qimpl impl Technique 2 : Burch-Dill Technique k = issue width of OOO • Restrict the number of entries in the Reorder Buffer • The number of ROB entry = r • Flushing as the abstraction function Abs • Alternate between executing the instruction at the head of the reorder buffer and retiringthe head • Inductive Invariants required for the initial state Qimpl • Critical for Out-of-Order processor verification • Redundancy present in the OOO model • Because of out-of-order execution and register renaming impl = Transition function of OOO spec = Transition function of ISA Abs = Relates OOO state with an ISA state

Qspec Qspec kspec Abs Abs Qimpl Qimpl impl Technique 2 : Burch-Dill Technique k = issue width of OOO • More automated than inductive invariant checking • Does not require auxiliary structures, • Far fewer invariants than invariant checking • Only 4 invariants compared to about 12 for inductive invariant checking approach impl = Transition function of OOO spec = Transition function of ISA Abs = Relates OOO state with an ISA state

Burch-Dill Technique for OOO • Exponential blowup with the number of ROB entries • Limited to r = 8 entries currently • r = 8 finished after case-splitting in 2.5hrs

Technique 3 : Invariant Checking • Deriving the inductive invariants • Require additional (auxiliary) variables to express invariants • Auxiliary variables do not affect system operation • Proving that the invariants are inductive • Automate proof of invariants in UCLID • Eliminates need for large (often fragile) proof script

Restricted Invariants and Proofs • Restricted classes of invariants • x1x2…xk (x1…xk) • (x1…xk) is a CLU formula without quantifiers • x1…xk are integer variables free in (x1…xk) • Proving these invariants requires quantifiers |= (x1x2…xk (x1…xk))  y1y2…ym (y1…ym) • Automatic instantiation of x1…xk with concrete terms • Sound but incomplete method • Reduce the quantified formula to a CLU formula • Can use the decision procedure for CLU

Shadow Structures • Auxiliary variables • Added to predict correct value of state variables • 3 shadow variables for 3 state variables • rob.value : shdw.value • rob.src1val : shdw.src1val • rob.src2val : shdw.src2val • Similar to McMillan’s approach and Arons et al.’s approach

valid tag val Reorder Buffer Fields incr D E C O D E dispatch Program memory valid value src1valid src1val src1tag src2valid src2val src2tag dest op PC Register Rename Unit shdw.value shdw.src1val retire ALU shdw.src2val Reorder Buffer execute Shadow Fields head tail Updated directly from the ISA model during dispatch Adding Shadow Structures result bus • shdw.src1val[rob.tail]  Rfisa(src1) • shdw.src2val[rob.tail]  Rfisa(src2) • shdw.value[rob.tail]  • ALU(Rfisa(src1), Rfisa(src2), op)

valid tag val Reorder Buffer Fields incr D E C O D E dispatch Program memory valid value src1valid src1val src1tag src2valid src2val src2tag dest op PC Register Rename Unit shdw.value shdw.src1val retire ALU shdw.src2val Reorder Buffer execute Shadow Fields head tail Adding Shadow Structures result bus • robt. rob.valid(t)  rob.value(t) = shdw.value(t) • robt. rob.src1valid(t)  rob.src1val(t) = shdw.src1val(t) • robt. rob.src2valid(t)  rob.src2val(t) = shdw.src2val(t)

valid tag val dispatch incr Reorder Buffer Fields D E C O D E Program memory valid value src1valid src1val src1tag src2valid src2val src2tag dest op PC Register Rename Unit shdw.value result bus shdw.src1val retire ALU shdw.src2val Reorder Buffer execute head tail Shadow Fields Refinement Maps • Correspondence with a sequential ISA model • OOO and ISA synchronized at dispatch • For Register File Contents • r. reg.valid(r)  reg.val(r) = Rfisa(r) • For Program Counter • PCooo = PCisa

Invariants • Tag Consistency invariants (2) • Instructions only depend on instruction preceding in program order • Register Renaming invariants (2) • Tag in a rename-unit should be in the ROB, and the destination register should match r.reg.valid(r) (rob.head  reg.tag(r) < rob.tail  rob.dest(reg.tag(r)) = r ) • For any entry, the destination should have reg.valid as false and tag should contain this or later instruction robt.(reg.valid(rob.dest(t))  t  reg.tag(rob.dest(t)) < rob.tail)

Invariants (cont.) • Executed instructions have operands ready robt. rob.valid(t)  rob.src1valid(t)  rob.src2valid(t) • Shadow-Value-Operands Relationship robt. shdw.value(t) = Alu(shdw.src1val(t),shdw.src2val(t),rob.op(t)) • Producer-Consumer Values (2) robt. rob.src1valid(t)  shdw.src1val(t) = shdw.value(rob.src1tag(t)) • Total 13 Invariants • Includes Refinement Maps • Constraints on Shadow Variables

Proving Invariants • Proved automatically • Quantifier instantiation was sufficient in these cases • Relieves the user of writing proof scripts to discharge the proofs • Time spent = 54s on 1.4GHz m/c • Total effort = 2 person days • Not possible to use SVC or CVC • Ordering between integer array indices • robt. rob.src1valid(t) rob.src1tag(t) < t • SVC/CVC interprets terms over reals • (x < y+1) (x  y) • Valid when x,y are integers • Invalid when x,y are reals

Why Quantifier Instantiation works

Extensions to the base model • Increase concurrency of design • Infinite number of execution units • Any subset of {dispatch,execute,retire,nop} can be active • The same invariants were proved inductive without any changes • Scalar Superscalar • Incorporate issue width = 2 and retire width = 2 • Data forwarding logic of the processor gets complicated • Same set of invariants proved automatically • No change in the proof script !! • Runtime increased from 54s to 134s

Adding circular reorder buffer • ROB modeled as a finite but arbitrary-size circular FIFO • Tags are reused • No dispatch when the reorder buffer is full • Changes in the model • Add a predicate rob.present() to indicate a rob entry contains valid entry • Change the dispatch logic to stall when ROB full • Modify ‘<’ to incorporate wrap-around • Changes in proof script • Add 1 invariant about the relationship of rob.present and active elements of ROB • Again the proof of invariants automatic !!

Liveness Proof • Liveness • Every dispatched instruction is eventually retired • Assumes a “fair” scheduler • Attempts to execute the instruction at the head infinitely often • Proceed by a high level induction • Not mechanical • Similar to Hosabettu [CAV98] approach • Most lemmas required are already proved during safety proof (in UCLID) • Concise proof

Current Status and Future Work • Use of decision procedure in deductive verification • Automate proof of invariants in micro-architecture verification with speculation, memory instructions [CMU-TR] • Automate proof of invariants in verification of a directory based cache coherence protocol with unbounded clients and unbounded channels • Need ways to generate (some) invariants automatically • Pnueli et al.’s invisible invariant method [CAV01] • Difficult to handle unbounded data, uninterpreted functions and ordering • Detecting convergence of such term-level models • Would enable automatic proof of models with finite buffers

Questions

Introduction and Related Work • Microprocessor Verification • Finite state symbolic Model Checking, • Berezin et al. • Compositional Model Checking, • McMillan et al. • Symbolic Simulation + Decision Procedure based, • Burch & Dill, • Bryant & Velev • Theorem Proving Techniques, • Sawada & Hunt, • Hosabettu et al., • Arons & Pnueli

Exploiting Positive Equality • Decision Procedure exploits “positive-equality” • Bryant, German, Velev , CAV’99 • Extended in presence of succ, pred operations • Bryant, Lahiri, Seshia CAV’02 • Positive Equality • Number of interpretations can be greatly reduced • Equations appearing only under even # of negations assigned false • Except when restricted by functional consistency • Terms compared in these equations get distinct interpretations --- called p-terms • Identifying p-terms is a pre-processing step

Instruction Set Architecture (ISA)

UCLID description

H0 T0 head tail next[head] := case (operation = POP) : succ’(head) ; default : head ; esac next[tail] := case (operation = PUSH) : succ’(tail) ; default : tail; esac succ’ := Lambda (x). case x = T0 : H0 ; default : succ(x); esac; next[content] := Lambda i. case (operation = PUSH) & (i = tail) : D ; default : content(i); esac Modeling Circular Queues

ALU Term-level modeling • Abstract Bit-Vectors with Integers (Terms) • Allow restricted set of operations • x=y, x  y, succ(x), pred(x) • “Black-box” certain combinational blocks • Replace by uninterpreted functions • Maintain functional consistency f

Example : Motorola ELF™ Processor • Features • 32-bit Dual issue with 64 GPRs • 5 stage pipeline • Out-of-order issue, in order completion of up to 2 instructions • Load/Store unit • 3-cycle load latency • Fully pipelined • Load queue for loads that miss in cache • Store queue for retiring store instruction • Other buffers to hide cache miss latency • 1000 lines of UCLID model derived from 20K lines of RTL

dISA dISA ISA state Impl state when no instruction(s) complete Impl state when 1 or 2 instruction(s) complete Bounded Property Checking • Compare the micro-architecture with a sequential ISA model w.r.t. Register File, Memory and PC • ISA model synchronized at completion dimpl dimpl dimpl dimpl dimpl dimpl

Quantifier Instantiation • Prove |= (x1x2…xk (x1…xk))  y1y2…ym (y1…ym) • Introduce Skolem Constants (y*1,…,y*m) |= (x1x2…xk (x1,…,xk))  (y*1,…,y*m) • Instantiate x1,…,xk with concrete terms • Assume single-arity functions and predicates • Let Fx = {f | f(x) is a sub-expression of (x1…xk)} • Let Tf = {t| f(t) is a sub-expression of (y*1…y*m)} • For each bound variable x, Ax = {t|f  Fx and t  Tf} • Instantiate  over Axi x Ax2 ...x Axk • Formula size grows exponentially with the number of bound variables

Updating Shadow Structures • During the dispatch of new instruction • I = <src1,src2,dest,op> • next[shdw.value] := t. (t = rob.tail ? Alu(Rfisa(src1),Rfisa(src2),op) : shdw.value(t)); • next[shdw.src1val] := t. (t = rob.tail ? Rfisa(src1) : shdw.src1val(t)); • next[shdw.src2val] := t. (t = rob.tail ? Rfisa(src2) : shdw.src2val(t));

valid tag val Reorder Buffer Fields incr D E C O D E dispatch Program memory valid value src1valid src1val src1tag src2valid src2val src2tag dest op PC Register Rename Unit shdw.value shdw.src1val result bus retire ALU shdw.src2val Reorder Buffer execute Shadow Fields head tail Adding Shadow Structures incr D E C O D E Program memory PC

Refinement Maps • For Register File Contents • r. reg.valid(r)  reg.val(r) = Rfisa(r) • If a register is not being modified by any instruction in ROB, then the value matches the ISA value • For Program Counter • PCooo = PCisa

Invariants valid value src1valid src1val src1tag src2valid src2val src2tag dest op 0

Burch-Dill Technique • More automated than inductive invariant checking • Does not require auxiliary structures, • Far fewer invariants than invariant checking • Only 4 invariants compared to about 12 for inductive invariant checking approach • Invariants on initial state Qooo • Instructions only depend on instruction preceding in program order • Tag in a rename-unit should be in the ROB, and the destination register should match • For any entry, the destination should have reg.valid as false and tag should contain this or later instruction • rob.head  rob.tail rob.head + r

Invariants • Total 13 invariants required • Refinement map for RF and PC (2) • Shadow structure constraints (3) • Tag Consistency invariants (2) • Instructions only depend on instruction preceding in program order • Circular Register Renaming invariants (2) • Tag in a rename-unit should be in the ROB, and the destination register should match r.reg.valid(r) (rob.head  reg.tag(r) < rob.tail  rob.dest(reg.tag(r)) = r ) • For any entry, the destination should have reg.valid as false and tag should contain this or later instruction robt.(reg.valid(rob.dest(t))  t  reg.tag(rob.dest(t)) < rob.tail)

Shuvendu K. Lahiri Sanjit A. Seshia Randal E. Bryant Carnegie Mellon University, USA

Shuvendu K. Lahiri Sanjit A. Seshia Randal E. Bryant Carnegie Mellon University, USA

Presentation Transcript

Randal E. Bryant

Randal E. Bryant

Sanjit A. Seshia and Randal E. Bryant Computer Science Department Carnegie Mellon University

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant

Randal E. Bryant