Connecting High Level Models and RTL: an Ongoing Battle

Connecting High Level Models and RTL: an Ongoing Battle Jesse Bingham Intel Feb 25 2009

Big Picture FPV Architecture CDC, TOV This red arrow is the problem de jour RTL Netlists FEV Diagram unapologetically stolen from Erik Layout/Backend

Motivation for Formal Verification Formal Verification (real life): full coverage in some areas Formal Verification (ideal case): full coverage of design space Simulation: spot coverage of design space Also stolen from Erik

Protocol Model checking Another Dimension… Formal specification FEV Arithmetic FV @Intel Today’s Topic (checker) Today’s Topic (formal) Property coverage Traditional Simulation-based Testing Theorem proving Bounded Model Checking Type-checking State/behavior coverage

Overview • Protocols naturally & succinctly specified by high level models (HLM) • In a sense, all RTL safety properties are captures by the HLM • Actual HW design (RTL) is hand-written by engineers • How do we establish that RTL adheres to its HLM? • What does adherence even mean mathematically? • Two approaches • Checker: HDL code that “watches” the design during simulation, raises alarms if it detects non-adherence • Most of this talk is about checkers • Formal Proof: prove that checker can never ever ring alarm • having the checker is obviously a prerequisite for formal proof • Notoriously hard problem in FV • but getting more and more important in HW design

HW Protocols • Distributed components exchanging messages • Control Oriented • Cannot be specified by input/output relations • State is king • Typically message latency insensitive (though message ordering often matters) • Naturally specified at high level using guardedcommand languages (Murphi, TLA, Unity, etc) • we’ll call this the high level model (HLM) • we use Murphi, but this work is independent of the particular modeling language

HLM: Guarded Commands[Dijkstra 1975] • Guard: predicate on states • Command: function mapping states to states • Guarded Command (GC): a guard & a command • Command is only allowed to fire if guard is true • Called rules or rulesets in Murphi… Rule “go to park” NOT raining ==> location := nearest_park(); end Ruleset food : FOOD “have picnic” hungry AND NOT raining ==> location := nearest_park(); eat(food); end

HLM Behaviors & Properties • State invariants: all reachable states are “okay” • Cache always has at most one entry for each address • More general safety properties • Cache returns most recently written data to a read request • Liveness (typically assuming fairness) • If you send a read request, cache will eventually return data enabled GC fires initial state …

Register Transfer Level (RTL) • Clock/state accurate (or at least close) • Pipelines • Schedulers • Special logic • Design-for-test • Clock gating • Reset • Written in hardware description language like System Verilog or VHDL (we use SV) • Can be formalizes as finite state automata or Kripke structures; we won’t do that today FV methods and CAD tools below RTL have advanced to the point where one can (if they choose to) safely think of RTL as the real Silicon

Refinement Map • A function RM taking RTL states to HLM states is called a refinement map • Intuitively, RM(r) is the HLM state that summarizes RTL state r • Many-to-one in general • Human writes this in our methodology • Generalization: RM depend on RTL signals at fixed offsets from current cycle • Useful for dealing with RTL pipelines

Behavioral Refinement a guarded command fires initial state … HLM Behavior Refinement map … RTL Behavior one RTL clock cyle reset state Each RTL clock cycle corresponds to zero or more guarded commands firing

(gc1, gc2 , … gck) Checking Refinement GC_prediction(r) = … Next HLM RM(r) RM(r) =? r RTL r one RTL clock cyle

Running Example:Toy Cache Controller CPU Cache Controller Main Memory

Cache2Cpu Let’s pretend these don’t exist Mem2Cache Cache Controller HLM Cpu2Cache State  {Invalid,Dirty,Clean} Data Addr CacheArray … … … Cache2Mem

Cache Controller HLM GCRecv_Store Ruleset i : CacheIndex “Recv Store" Cpu2Cache.opcode = Store & CacheArray[i].State != Invalid & CacheArray[i].Addr = Cpu2Cache.Addr ==> CacheArray[i].Data := Cpu2Cache.Data; CacheArray[i].State := Dirty; Absorb(Cpu2Cache); end

Cache Controller HLM GCEvict Ruleset i : CacheIndex “Evict" CacheArray[i].State != Invalid ==> if (CacheArray[i].State == Dirty) begin Cache2Mem.opcode := WriteBack; Cache2Mem.Addr = CacheArray[i].Addr; Cache2Mem.Data = CacheArray[i].Data; end; CacheArray[i].State := Invalid; end

Cache Controller RTL Cpu2Cache Cache State & Addr Array Cache Data Array Pipe stage 1 Hit? Eviction Logic Pipe stage 2 Cache2Mem

Store with Eviction Cpu2Cache Store(A0,D0) Cache State & Addr Array Cache Data Array Pipe stage 1 Hit? Store(A0,D0) Dirty,A0 Dirty,A1 D0 D1 Eviction Logic Pipe stage 2 WriteBack(A1,D1) Store(A0,D0) WriteBack(A1,D1) Cache2Mem

Store with Eviction Revisited When do the HLM GCs “happen” in the RTL? Store Evict Store(A0,D0) Cache State & Addr Array Cache Data Array Pipe stage 1 Hit? Store(A0,D0) Dirty,A0 Dirty,A1 D1 D0 Eviction Logic Pipe stage 2 WriteBack(A1,D1) Store(A0,D0) WriteBack(A1,D1)

Key Point #1 Pipelining causes GCs that are atomic in the HLM to be non-atomic in the RTL. This non-atomicity must be handled by the refinement map.

Key Point #2 In the HLM GCs are interleaved; while the RTL can exhibit true GC concurrency. This must be resolved by the GC prediction.

Cache Controller Refinement Map(conceptual) function HLM_STATE RM(); // refinement map function HLM_STATE HLM; HLM.CacheArray[].State = RTL.AddrArray[].State; HLM.CacheArray[].Addr = RTL.AddrArray[].Addr; HLM.CacheArray[].Data = RTL.DataArray[]@+1; HLM.Cpu2Cache = RTL.Cpu2Cache@-1; HLM.Cache2Cpu = RTL.Cache2Cpu@+1; return(HLM); end; <signal>@k denotes the value <signal> will have k clock cycles in the future (k can be negative too, to refer to the past)

Cache Controller Refinement Map(with only non-positive temporal offsets) function HLM_STATE RM(); // refinement map function HLM_STATE HLM; HLM.CacheArray[].State = RTL.AddrArray[].State@-1; HLM.CacheArray[].Addr = RTL.AddrArray[].Addr@-1; HLM.CacheArray[].Data = RTL.DataArray[]; HLM.Cpu2Cache = RTL.Cpu2Cache@-2; HLM.Cache2Cpu = RTL.Cache2Cpu@; return(HLM); end; <signa>@-k can be constructed using System Verilog’s $past operator

Evict RecvStore Store with Eviction Re-Revisited HLM RTL Store(A0,D0) Cache State & Addr Array Cache Data Array Pipe stage 1 Store(A0,D0) D1 D0 Dirty,A0 Dirty,A1 Pipe stage 2 WriteBack(A1,D1) Store(A0,D0) WriteBack(A1,D1)

Cache Controller GC Prediction function HLM_STATE Next_HLM_STATE(HLM_STATE hs); if (RTL.Cpu2Cache.Valid@-2) begin i = get_target_cache_index()@-2; if (will_need_eviction()@-2) hs = Evict(hs,i); if (RTL.Cpu2Cache.Op@-2 = STORE) hs = Recv_Store(hs,i); else if (RTL.Cpu2Cache.Op@-2 = LOAD) hs = Recv_Load(hs,i); end; ... // figure out when to fire Send_Memory_Request // and Recv_Memory_Response end; Can result in 0, 1, or 2 GCs fired

Evict RecvStore(A0) RecvStore(A1) Back-to-back Stores with Eviction HLM RTL Store(A0,D0) Store(A2,D2) Data Array State& Addr Array Pipe stage 1 D2 Dirty,A2 Store(A0,D0) Store(A2,D2) Dirty,A0 Dirty,A1 D0 D1 Pipe stage 2 WriteBack(A1,D1) Store(A2,D2) Store(A0,D0) WriteBack(A1,D1)

FYI, we do everything inSystem Verilog • Actual design under verificaiton • written by HW designers • Test bench • written by HW validators • HLM • written in Murphi by FV team in consultation with Architects • compiled into SV by a tool we wrote • Refinement Map • hand-written in SV by FV team • GC Prediction • hand-written in SV by FV team

Formal Proofof Refinement

Formal Proof of Refinementversion 1.0: looks like FEV Can be decided by SAT- or BDD-based solver engine Next(RM()) HLM RM() ? RM() = Next(RM()) RM() This will most certainly fail for some unreachable RTL states! Rats! RTL   one RTL clock cyle Totally symbolic RTL state; (represents all possible RTL states) Also might blow-up

Formal Proof of Refinementversion 2.0: write an invariant Can be decided by SAT- or BDD-based solver engine Next(RM()) RM() HLM Inv() RM() = Next(RM()) RM() But concocting Inv is difficult, not to mention you need to also proveInv is invariant RTL   one RTL clock cyle Totally symbolic RTL state; (represents all possible RTL states) Also might blow-up

Formal Proof of Refinementversion 3.0: Model Checking • start from initial state of env-HLM & RTL • compute forward reachability via symbolic model checking • verify that checker never fires. HLM of Environment RTL & checker Will likely blow-up; Probably need to restrict behaviors; e.g. use 4 addresses rather than 232

Open Problems • Refinement map is part of spec… or is it? • Formal proof: best approach? • I spent 1.5 years banging my head on the formal side; the fact that I’ve retreated to checkers says something • Tool issues: pain in the butt • Generated System Verilog has hit 4 bugs so far in expensive third-party simulator • HLM/RTL discrepancies: can we weaken our notion of refinement to allow for reasonable mismatches? • E.g. HLM transmits message instantaneously, while RTL scheduling causes arbitrary delay before transmission

Partial Bibliography • Using formal HLM as a checker: • Linking simulation with Formal Verification at a Higher Level, Tasiran, Batson, & Yu, 2004 • Runtime Refinement Checking of Concurrent Data Structures, Tasiran & Qadeer, 2004 • Original Murphi paper: • Protocol Verification as a Hardware Design Aid, Dill, Drexler, Hu, & Yang, 1992 • Formal verification of refinement maps for hardware • Automatic Verification of Pipelined Microprocessor Control, Burch & Dill, 1994 • Protocol Verification by Aggregation of Distributed Transactions, Park & Dill, 1996 • A Methodology for Hardware Verification using Compositional Model Checking,McMillan, 2000 • The Formal Design of 1M-gate ASICs, Eiriksson, 2000 • Theory involving refinement in the face of fairness • On the Existence of Refinement Maps, Abadi & Lamport, 1991 • Commercial Tools • BlueSpec (BlueSpec Inc.) • Pico (Synfora) • SLEC (Calypto)

Backups

Cache Controller HLM(typedefs & var decls in Murphi) type ---- Type declarations ---- <snip> CACHE_ENTRY : record State : enum {Invalid, Dirty, Clean}; Addr : ADDR; Data : DATA; end; <snip> var ---- State variables ---- CacheArray : array [0...CACHE_SIZE-1] of CACHE_ENTRY; Cpu2Cache : CPU2CACHE_MSG; Cache2Cpu : CACHE2CPU_MSG; Mem2Cache : MEM2CACHE_MSG; Cache2Mem : CACHE2MEM_MSG;

Guarded Commands Formalized • State space S = type consistent assignments to variables • Init: subset of state space specifying initial states • A guarded command (GC) is a pair (g,c), where • g : S {True,False} is called the guard; GC is enabled in state s if g(s) = True • c : S  S is called the command; GC fires from s to c(s) • Semantics: HLM can transition from s to s iff there exists a GC that • is enabled in s • fires from s to s • Nondeterminism arrises when multiple GCs are enabled • In practice GCs are often parameterized • We assume that the stuttering GC (s.True , s.s ) is implicit

Refinement Formalized • Let H and R be respective state spaces of HLM and RTL • A function RM: R H is called a refinement map • Intuitively, RM(r) is the HLM state that summarizes RTL state r • Many-to-one in general • Human writes this in our methodology • We generalize this so that RM: RwH , for some fixed w • Hence RM maps a fixed length sequence of RTL states to H • Useful for dealing with RTL pipelines

Cache Controller HLM GCs (1/2) Ruleset i : CacheIndex “Recv Store" Cpu2Cache.opcode = Store & CacheArray[i].State != Invalid & CacheArray[i].Addr = Cpu2Cache.Addr ==> CacheArray[i].Data := Cpu2Cache.Data; CacheArray[i].State := Dirty; Absorb(Cpu2Cache); Ruleset i : CacheIndex “Recv Load" Cpu2Cache.opcode = Load & CacheArray[i].State != Invalid & CacheArray[i].Addr = Cpu2Cache.Addr ==> Cache2Cpu.Data := CacheArry[i].Data; Absorb(Cpu2Cache); Ruleset i : CacheIndex “Evict" CacheArray[i].State != Invalid ==> if (CacheArray[i].State == Dirty) begin Cache2Mem.opcode := WriteBack; Cache2Mem.Addr = CacheArray[i].Addr; Cache2Mem.Data = CacheArray[i].Data; end; CacheArray[i].State := Invalid;

Cache Controller HLM GCs (2/2) Ruleset i : CacheIndex ; a : Addr “Send Memory Request" CacheArry[i].State = Invalid ==> Cache2Mem.opcode := Get; Cache2Mem.Index := i; Cache2Mem.Addr = a; end Ruleset i : CacheIndex “Recv Memory Response" Mem2Cache.opcode = Response ==> CacheArry[Mem2Cache.Index].Data := Mem2Cache.Data; CacheArry[Mem2Cache.Index].Addr := Mem2Cache.Addr; CacheArry[Mem2Cache.Index].State := Clean; Absorb(Mem2Cache); end

Load Miss (moot) Cache2Cpu Cpu2Cache Response(D0) Load(A0) Cache State & Addr Array Cache Data Array Pipe stage 1 Hit? Get(A0) D0 Clean,A0 Eviction Logic Pipe stage 2 Get(A0) Get(A0) Response(A0,D0) Cache2Mem Mem2Cache

Cache Controller Refinement Map(conceptual) function HLM_STATE RM(); // refinement map function HLM_STATE HLM; for (int i=0 ;i < CACHE_SIZE; i++) begin HLM.CacheArray[i].State = RTL.AddrArray[i].State; HLM.CacheArray[i].Addr = RTL.AddrArray[i].Addr; HLM.CacheArray[i].Data = RTL.DataArray[i]@+1; end; HLM.Cpu2Cache = RTL.Cpu2Cache@-1; HLM.Cache2Cpu = RTL.Cache2Cpu; return(HLM); end; <signal>@k denotes the value <signal> will have k clock cycles in the future (k can be negative too)

Connecting High Level Models and RTL: an Ongoing Battle

Connecting High Level Models and RTL: an Ongoing Battle

Presentation Transcript

Ongoing Professional Practice Evaluation

Low and Mid Level Vision Tom Ouyang

Consultation Models

Conducting a Deliberate Attack on the Squad Level

Battle Drills

Into Battle

Verification and debugging of hardware designs utilizing C-based high- level design descriptions

Sparsity-Based Signal Models and the Sparse K-SVD Algorithm

OLSRv2 High Level Overview

Life’s a Battle Be Prepared

High Level Triggering

Health Level Seven Reference Models

High-Level Synthesis: Creating Custom Circuits from High-Level Code

My Trivial - Level II

Tutorial: The Zoltan Toolkit

Battle of Britain

CHAPTER 8 – High-Level Programming Languages

Lower Power High Level Synthesis

Tutorial: The Zoltan Toolkit

Names, Scopes and Bindings

Econ 240 C

Electronic Business Models