1 / 76

Verification at HP Labs

Verification at HP Labs. Mark Tuttle (with the help of many friends at) HP Labs. Overview of verification work. Cache coherence protocols Alpha EV6, EV7, EV8 protocols Itanium Bus protocols: PCI-X, Infiniband (FIO/NGIO/SIO) Database systems Distributed algorithms

rafiki
Télécharger la présentation

Verification at HP Labs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs

  2. Overview of verification work • Cache coherence protocols • Alpha EV6, EV7, EV8 protocols • Itanium • Bus protocols: • PCI-X, Infiniband (FIO/NGIO/SIO) • Database systems • Distributed algorithms • A SAT-based bounded model checker • Applications to Itanium software

  3. Most of this work uses TLA+ • Lamport’s specification language based on set theory, first-order logic, temporal logic • Hierarchical style improves readability, rigor • specifications: becomes • proofs: becomes • Most find reading easy, writing not too hard <1>1. <2>1. CASE <2>2. CASE <2>3. QED

  4. Wildfire: EV6 cache coherence Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Steve Van Doren DTAG 32 processor server P1 arbiter DIR quad quad P2 localswitch TTT quad quad P3 globalswitch quad quad P4 global port quad quad mem

  5. Directory-based cache coherence processors memory directory To get x, go to x’s directory to see who owns x. P1 P2 P3 x x copies owner 5 P4

  6. Get read-only copy Q Fill(x,5) Fwd(x) Rd(x) P x copies=Q owner=Q P,Q

  7. Get writable copy Q FwdRdEx(x) FillEx(x,5) R Inval(x) P x copies=Q,R,S owner=Q Inval(x) RdEx(x) S P P

  8. Data Data RdEx(x) Data A complicated protocolDirectory can be many steps ahead of processors R1 R2 R3 RdEx(x) RdEx(x) FwdRdEx(x) FwdRdEx(x) Dir

  9. A complicated protocolGenerates data and commit events independently • Memory barriers impose instruction order • Maintain count of outstanding off-chip requests • Pass memory barrier only when count is 0 read AMBread B read AMBread B inval(x)inval(y)data(A) inval(x)inval(y)commit(A) data(A)

  10. Reads are fast MBs are fast Dramatic speedups possible read A… work ...MBread B read AMBread B data(A) owner inval(x)inval(y)commit(A) commit(A) data(A) fwd(A) “Intuitively surprising this actually works!”

  11. Wildfire verification Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu • We are asked to look at the protocol • We arrive very late (almost tape-out) • No time for complete proof • But enough time for a rigorous analysis

  12. Wildfire cache coherence in “three easy steps”+“two-man years” Model Alpha memory model.(200 lines) Prove implementation (550 lines, 2 months, informal) Model abstract protocol.(500 lines) Prove implementation (5500 lines, 4+ months, incomplete) Model complete protocol.(2000 lines, 3 months)

  13. Step 1: Alpha memory model • Official specification is • Informal: an English document • Behavioral: defines acceptable sequences of memory operations • Our specification is • Precise: a single logical formula • State-based: required for invariance-style proofs • We did simplify the model slightly: • Operations read and write entire cache lines • Some “impossible” implementations ruled out • Compare the specifications: 12 pages vs 200 lines

  14. The heart of the model • A Before order • Orders reads and writes in an execution • Determines return values for the reads • A GoodExecutionOrder predicate • Defines the Before orders allowed by the model

  15. State machine actions • ReceiveRequest(proc, req) Receive a request • ChooseNewData(proc, idx) Choose the return value for a request • Respond(proc, idx) Return the value to a request • ExtendBefore Expand the Before relation • Actions must preserve GoodExecutionOrder.

  16. GoodExecutionOrder This is the hard part --- look how short it is! GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\\A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2)

  17. /\ (*******************************************************) (* Writes and successful SCsto the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1)

  18. /\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2)

  19. /\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2)

  20. Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, • abstract protocol’s correctness was far from obvious • we discovered a bug… in the memory model Proved hardest part of correctness: • Proved the Before order is acyclic • 35-line invariant based on 300 lines of definitions • 550-line proof, cases nested 10 levels deep

  21. Found: Alpha memory model bug x=0, y=0 P: if x=1 then y:= 2 Q: if y=2 then x:=1 x=1, y=2 Original Alpha memory model allowed This behavior breaks the critical section implementation recommended in the SRM. (Jim Saxe)

  22. Revised Alpha memory model causal cycle break the cycle P: if x=1 then y:=2 P: if x=1 then y:=2 Q: if y=2 then x:= 1 Q: if y=2 then x:= 1

  23. Wildfire counterexample P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 The Alpha memory model says x=3,but in Wildfire it could be x=1…

  24. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 P x=1 Q R x=0 ITD(x) ok Inval(x) directory

  25. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 x=1 P x=1 Q x=1 R x=0 Rd(x) Fwd(x) Inval(x) directory

  26. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 P x=1 Q x=1 R x=0 y=2 ITD(y) ok Inval(x) directory

  27. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 y=2 P x=1 Q x=1 R x=0,3 y=2 y=2 Rd(y) Fwd(y) Inval(x) directory

  28. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:=3 The result must be x=3, but the result is x=1. The same thing was possible in other machines.(Kourosh Gharachorloo) P x=1 Q x=1 R x=3 y=2 y=2 Inval(x)

  29. What went wrong? An ordering internal to P … forced an ordering for Q: P: if x=1 then y:=2 P: if x=1 then y:=2 Q: if y=2 then x:= 1 Q: if y=2 then x:= 1 The fix: use internal orderings to forbid orderings,but not to force orderings.

  30. P: x:=1 Q: if x=1 then y:=2 R: if y=2 then x:= 3 New Alpha memory model There is no dependency/source cycle: … R1 W1 R2 W2 Wn

  31. Step 3: Model complete protocol Obstacle: no single, complete description English documents: 12 documents, 4-inch stack Lisp simulator: crucial to understanding some details None compact, none mathematically tractable Different levels of abstraction, some inconsistency We had to write our own description

  32. Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod

  33. Solution: Quarks • Ack • ChangeToDirty • Clear • Comsig • Fill • ForwardedGet • GetValue • InvalidToDirty • QuadInvalidate • ReleaseMAF • ReleaseVDB • SetCacheLineState • Victimize • Write Quarks combine to form messages.

  34. Quarks form messages, then split up owner ForwardedGet, QuadInval, Comsig ForwardedGet copy holders GetValue home quad global switch QuadInval Comsig reader

  35. Quarks resolve message overloading • “ChangeToDirtySuccess” could mean • {AckChangeToDirty, Comsig, QuadInvalidate^*, ClearOutstandingInval} • {AckChangeToDirty, Comsig, QuadInvalidate^*} • {Comsig, ReleaseMAF, SetCacheLineState} Quarks simplify algorithm description • Each quark processed separately, independently • Each data structure changed by a single quark

  36. Quark handling If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated? ProcFieldsMessage(proc, msg) == /\ ... /\ Cache' = CASE ... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ]

  37. Wildfire invariant Define an invariant describing all reachable states.1000 linesProve invariance. We focused on the most difficult, error-prone parts: messages messages cache dtag directory on quad(150 lines) off quad(150 lines)

  38. Dir - Dtag Invariant DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *) ... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *) ... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *) ... c.\/ (* dtag is clean *) ... 2./\ Proj(HomeToArbQ) =[ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG* DTagCacheInvariant == ... Mother == DirDTagInvariant /\ DTagCacheInvariant /\ ...

  39. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED

  40. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) <2>2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED

  41. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *) ... <14>1. CASE doing something at the proc Pf: .... <14>2. CASE doing something at the arb <14>3. QED ... <2>2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) <1>3. QED

  42. The implementation proof In Step 2, we defined an abstract model of the Wildfire algorithm In Step 3, we defined a complete model of the Wildfire algorithm Now use the invariant to prove that the complete model implements the abstract model. This is undone.

  43. Results: one bug A fetch is an uncached read. Victimization removes data from the cache. The bug allows a fetch to interfere with victimization. To demonstrate the bug, we need to describe more of the hardware…

  44. The quad architecture quad proc proc proc proc P cache ttt dtag directory memory GP Arb switch to other quads

  45. Dtag: a duplicate copy of cache state One use: invalidate all copies on a quad. P cache y r/w dtag inval(y) Arb y P r/w inval(y)

  46. TTT: tells state of off-quad requests P cache y ttt write(y) write(y) ackwrite(y) GP write(y)

  47. The Bug By causing a fetch to interfere with a victimize, generate an Inval(y) to a cache without a copy of y. P cache dtag inval(y) Arb y r/w inval(y)

  48. Initial state: P owns y P Q R S y ttt dtag gp arb y: P dir mem y: P y

  49. Now P victimizes y to read x into same cache line P Q R S y write(y) get(x) dtag ttt gp arb y: P write(y) ackwrite(y) write(y) ackwrite(y) dir mem y: P y

  50. So P is waiting for x P Q R S get(x) dtag ttt gp arb y: P write(y) ackwrite(y) ackwrite(y) dir mem y: y

More Related