1 / 48

Verification of cache-coherence protocols with TLA+

Verification of cache-coherence protocols with TLA+. Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation. TLA+. A formal specification language based on set theory, first-order logic, temporal logic

chacha
Télécharger la présentation

Verification of cache-coherence protocols with TLA+

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter, Leslie Lamport, Joshua Scheid, Mark Tuttle, Yuan Yu Compaq Computer Corporation

  2. TLA+ • A formal specification language based on set theory, first-order logic, temporal logic • Engineers find reading easy, writing not too hard CacheUnmodified(adr) == \/ SharedMode(adr) \/ /\ ExclusiveMode(adr) /\ ~DirtyBitSet(adr) Cache’ == [Cache EXCEPT ![adr].state = “Invalid”] Compaq Computer Corporation

  3. Used TLA+ to demonstrate formal methods to engineering • Analyzed cache-coherence protocols for • EV6: Alpha 21264 processor • EV7: Alpha 21364 processor • Built TLC, a model-checker for TLA+ • Analyzed proposals for industry standards • PCI-X, … Compaq Computer Corporation

  4. EV6 cache coherence processors memory directory To get x, go to x’s directory to see who owns x. P1 P2 P3 x x copies owner 5 P4 Compaq Computer Corporation

  5. Data S,S,S,R Shared read, data in memory S adr copies owner S R x S,S,S none S Rd(x) Compaq Computer Corporation

  6. FwdRd(x) Data S,S,S,R Shared read, remote owner O S adr copies owner S R x S,S,S O S Rd (x) Compaq Computer Corporation

  7. Data Inval none R Exclusive read, data in memory S adr copies owner S R x S,S,S none S RdEx(x) Compaq Computer Corporation

  8. FwdRdEx(x) Data Inval none R Exclusive read, remote owner O S adr copies owner S R x S,S,S O S RdEx(x) Compaq Computer Corporation

  9. No InvalAcks Inval RdEx(x) R Dir S InvalAck NO! Fewer messages sent, and R not blocked waiting for InvalAck. Now correctness depends on network message ordering. Compaq Computer Corporation

  10. No dirty write backs required O FwdRdEx(x) WriteBack NO! Data Dir R RdEx(x) Fewer messages sent. Now correctness depends on the owner always holding the data. Compaq Computer Corporation

  11. Data Data RdEx(x) RdEx(x) RdEx(x) Data FwdRdEx(x) FwdRdEx(x) Chains of requests R1 R2 R3 Dir Compaq Computer Corporation

  12. Memory barriers All memory ordering imposed by memory barriers. read flag MB read data How do we know when this ordering has been determined? The answer is highly optimized. Compaq Computer Corporation

  13. Separate commit/data responses O Data FwdEx(x) Commit Dir R Rd(x) MB passed when all outstanding commits are received. Commits generated as early as possible! Compaq Computer Corporation

  14. Significant speed ups R Data can be returned faster. Inval(y)Inval(z)Commit Data MB can be passed faster. R read flag MB Data commit read data Dir But now verification is much harder. Compaq Computer Corporation

  15. Hierarchical network global switch memory directory local switches processors At the home node, always satisfy requests locally if possible... Compaq Computer Corporation

  16. FwdRd(x) FwdRd(y) FwdRd(y) FwdRd(x) Rd(x) Rd(x) FwdRd(x) FwdRd(y) owned y shared x shared y owned x Deadlock: the deadly embrace home x home y Deadlock: FwdRds are stalled waiting for data to arrive. Compaq Computer Corporation

  17. FwdRd(x) FwdRd(x) NO! FwdRd(x) Rd(x) Shadow mode FwdRd(x) • FwdRd is a shadow starter • (when the reader is on the home node) Rd(x) • Subsequent messages are shadowed in shadow mode • (bounced off the global switch) Compaq Computer Corporation

  18. FwdRd(y) FwdRd(y) FwdRd(x) FwdRd(x) owned y shared x shared y owned x Shadow mode solves deadlock FwdRd(x) FwdRd(y) FwdRd(x) FwdRd(y) home x home y Data travels in a separate channel: other messages don’t block data. Deadlock gone. Compaq Computer Corporation

  19. This is not your father’scache coherence protocol! • Protocol is highly optimized: • No InvalAcks or NoAcks, no Dirty Write Backs • Long chains of data forwarding • Separate commit/data messages • Aggressive early commit generation • Shadow mode… • Protocol was the largest to be analyzed with formal methods (to our knowledge as of 1997). Compaq Computer Corporation

  20. EV6 cache coherence in “three easy steps”+“two-man years” Model Alpha memory model.(200 lines) Prove implementation (550 lines, 2 months, informal) Model abstract protocol.(500 lines) Prove implementation (5500 lines, 4+ months, incomplete) Model complete protocol.(2000 lines, 3 months) Compaq Computer Corporation

  21. Step 1: Alpha memory model We specified the Alpha memory memory model: • The official specification is an informal description of the allowed sequences of reads and writes. • We needed a precise, state-based specification. • We specified a slightly simplified memory model. (whole cache line access, common point of synchronization) Compare the specifications: • Official, English specification: 12 pages • Logical, precise specification: 200 lines Compaq Computer Corporation

  22. Key definition: read/write ordering Before order for an execution orders reads/writes and determines what values are returned by reads. GoodExecutionOrder defines good Before orders, namely the orders allowed by the memory model. Compaq Computer Corporation

  23. State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions preserve GoodExecutionOrder. Compaq Computer Corporation

  24. GoodExecutionOrder This is the hard part --- but look how short it is! GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\\A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) Compaq Computer Corporation

  25. /\ (*******************************************************) (* Writes and successful SCsto the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1) Compaq Computer Corporation

  26. /\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2) Compaq Computer Corporation

  27. /\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2) Compaq Computer Corporation

  28. Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, • abstract protocol’s correctness was far from obvious • we discovered a bug… in the memory model Proved hardest part of correctness: • 35-line invariant based on 300 lines of definitions • 550-line proof, cases nested 10 levels deep Compaq Computer Corporation

  29. Step 3: Model complete protocol Protocol: 9 man-months, 1900 lines of TLA+ Partial proof: 7 man-months, 1000-line (partial) invariant Compaq Computer Corporation

  30. Obstacle: multiple descriptions English documents: 10 documents, 2-inch stack Lisp code: crucial to understanding some details None compact, none mathematically tractable Solution: write our own model We used TLA+ Compaq Computer Corporation

  31. Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod Compaq Computer Corporation

  32. Solution: Quarks • Ack • ChangeToDirty • Clear • Comsig • Fill • ForwardedGet • GetValue • InvalidToDirty • QuadInvalidate • ReleaseMAF • ReleaseVDB • SetCacheLineState • Victimize • Write Quarks combine to form messages. Compaq Computer Corporation

  33. Protocol example If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated? ProcFieldsMessage(proc, msg) == /\ ... /\ Cache' = CASE ... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ] Compaq Computer Corporation

  34. The low-level invariant Define protocol in terms of quarks. Define an invariant describing all reachable states. We considered only the most difficult parts: messages messages cache dtag directory on quad off quad Compaq Computer Corporation

  35. Dir - Dtag Invariant DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *) ... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *) ... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *) ... c.\/ (* dtag is clean *) ... 2./\ Proj(HomeToArbQ) =[ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG* DTagCacheInvariant == ... Mother == DirDTagInvariant /\ DTagCacheInvariant /\ ... Compaq Computer Corporation

  36. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation

  37. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) <2>2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation

  38. DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' <1>1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) <2>1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *) ... <14>1. CASE doing something at the proc Pf: .... <14>2. CASE doing something at the arb <14>3. QED ... <2>2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) <2>3. QED <1>2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) <1>3. QED Compaq Computer Corporation

  39. The low-level refinement For the abstract protocol, we defined the Before ordering for the protocol. For the low-level protocol, we defined an invariant describing the reachable states. Now use the invariant to prove that the Before ordering is the actual low-level ordering. This refinement proof is undone. Compaq Computer Corporation

  40. One bug found • Quite unexpected to find only one bug! • Fix was an easy bookkeeping modification. • Demonstrating the bug requires • four processors • two memory locations • fifteen messages • Hand proof appears essential to finding this bug: • extensive simulation did not find it • state space too large for exhaustive model checking Compaq Computer Corporation

  41. Wildfire challenge problem • http://www.research.digital.com/SRC/personal/ • lamport/tla/wildfire-challenge.html • We give you TLA+ models of • the Alpha memory model • the abstract protocol with one bug inserted and challenge you to find the bug. • Incredibly, Georges Gonthier found it by inspection (plus a memory model mistake)! Compaq Computer Corporation

  42. Check for Invariant false Deadlock TLC model checker State machine in rich subset of TLA+ (Initial, NextState) Configuration file making state machine finite Minimal state trace from an initial state to a bad state Invariant Compaq Computer Corporation

  43. TLC implementation • Require no changes to TLA+ specifications • use the richness of TLA+, no primitive language • use configuration files instead • Interpret specifications, don’t compile them • better user interaction possible • Use explicit state representation, not BDDs • BDD encoding of TLA+ formulas difficult • use canonical state representation + fingerprinting • use efficient disk-based state set and queue implem. Compaq Computer Corporation

  44. TLC status • 20,000 lines of Java • Available to alpha testers under nondisclosure • Performance is good, sometimes slow: threaded and distributed implementations now exist. • Liveness checking/livelock detection coming • Coverage analysis is desired: What does lack of an error mean: a correct spec or a buggy spec? Compaq Computer Corporation

  45. EV7 cache coherence • First intense application of TLC model checker • First TLA+ specification written by engineers • Specification is 1800 lines • Specification accepted by TLC w/o modification • State space reduced 50% by adding 15 lines to remove a lot of symmetry in state space Compaq Computer Corporation

  46. Results • 73 bugs found (90% found by TLC): • 37 minor: typos, type errors, etc • 12 bugs: wrong message/wrong state • 14 missing cases • 7 spurious cases (dead code) • 3 miscellaneous (1 TLA+, 1 MC, 1 spec design) • War story: Find bug B by hand; find bug B’ like B by simulation; find bug B’’ in bug-fix for B; find “???” written in original documentation! Compaq Computer Corporation

  47. Lessons learned • Learning TLA+ is not a major task, but writing good specifications still requires experience • EV6 verification was • humbling: only one error actually found • encouraging: the basic method works as expected • EV7 verification was very satisfying: • TLA+ specifications can be written by engineers • TLC can handle industrial-sized specifications • Formal specification belongs in design process… Compaq Computer Corporation

More Related