230 likes | 348 Vues
Implementation and Verification of a Cache Coherence protocol using Spin. Steven Farago. Goal. To use Spin to design a “plausible” cache coherence protocol
E N D
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago
Goal • To use Spin to design a “plausible” cache coherence protocol • Introduce nothing in the Spin model that would not be realistic in hardware (e.g. instant global knowledge between unrelated state machines) • To verify the correctness of the protocol
Background • Definition: Cache = Small, high-speed memory that is used by a single processor. All processor memory accesses are via the cache. • Problem: • In a multiprocessor system, each processor could have a cache. • Each cache could contain (potentially different) data for the same addresses. • Given this, how to ensure that processors see a consistent picture of memory?
Coherence protocol • A Coherence protocol specifies how caches communicate with processors and each other so that processors will have a predictable view of memory. • Caches that always provide this “predictable view of memory” are said to be coherent.
A Definition of Coherence • A “view of memory” is coherent if the following property holds: • Given cacheline A, two processors may not see storage accesses to A in a conflicting order. • Example: • Processor 0 Processor 1 Processor 2 Processor 3 • Store A, 0 Load A, 0 Load A, 0 Load A, 1 • Store A, 1 Load A, 1 Load A, 0 Load A, 0 • Coherent Coherent ** NOT Coherent • Informally, a processor may not see “old” data after seeing “new” data.
Standard Coherence Protocol • MESI (Modified, Exclusive, Shared, Invalid) • Standard protocol that is supposed to guarantee cache coherence • Each block in the cacheline is marked with one of these states. • Cacheline accesses are only allowed if the cache states are “correct” w.r.t the coherence protocol • Examples: • A cache that is marked “invalid” may not provide data to a processor. • Cacheline data may not be updated unless the line is in the Exclusive or Modified
System Model • Initial version • Three state machines • ProcessorModel: Non-deterministically issues Loads and Stores to cache forever • CacheModel: Two parts - initially combined into a single process • MainCache - Services processor requests. • Snooper - Responds to messages from memory controller • MemoryController - Services requests from each cache and maintains coherency among all
System Model Processor Processor MainCache MainCache Snooper Snooper MemoryController
ProcessorModel • Simple • Continually issues Load/Store requests to associated Cache. • Communication done via Bus Model. • Read requests are blocking • Coherence verification done when Load receives data (via Spin assert statement)
CacheModel • Two parts: MainCache and Snooper • MainCache services ProcessorModel Load and Store requests and initiates contact with the MemoryController when an “invalid” cache state is encountered • Snooper services independent request from MemoryController. Requests necessary for MemoryController to coordinate coherence responses.
MemoryControllerModel • Responsible for servicing Cache requests • 3 Types of requests • Data request: Cache requires up-to-date data to supply to processor • Permission-to-store: A Cache may not transition to the Modified state w/o MC’s permission • A combination of these two • All types of requests may require MC to communicate with all system caches (via Snooper processes) to ensure coherence
Implementation of Busses • All processes represent independent state machines. Need communication mechanism • Use Spin depth 1 queues to simulate communication. • Destructive/Blocking read of queues requires global bool to indicate bus activity (required for polling). • Global between processes valid to make up for differences between Spin queues and real busses
Problems - Part 1 • MainCache and Snooper initially implemented as a single process. • Process nondeterministically determines which to execute at each iteration • Communication between Processor/Cache and Cache/Memory done with blocking queues • Blocked receive in MainCache --> Snooper cannot execute • Leads to deadlock in certain situations
Solution 1 • Split MainCache and Snooper into separate processes. • Both can access “global” cacheData and cacheState variables independently
--> Problems - Part2 • As separate processes, Snooper and MainCache could change cache state unpredictably. • Race conditions: Snooper changes cache state/data while MainCache is in mid-transaction --> returns invalidated data to processor.
Solution 2 • Add locking mechanism to cache. • MainCache or Snooper may only access cache if they first lock it. • Locking mechanism: For simplicity, cheated by using Spin’s atomic keyword to implement test-set on a shared variable. • Assumption: Real hardware would have some similar mechanism available to lock caches. • Question: Revised model now equivalent to original??
--> Problem 3 • Memory controller allows multiple outstanding requests from caches. • Snooper of cache which has a MainCache request outstanding cannot respond to MC queries for other outstanding requests (due to locked cacheline). • Deadlock.
Solution 3 • Disallow multiple outstanding Cache/MC transactions. • Introduce global bool variable shared across all caches: outstandingBusOp. • A cache may only issue requests to the memory controller if no requests from other caches outstanding. • Global knowledge across all caches unrealistic. • Equivalent to “retries” from MC??
--> Problem 4 • Previous problems failed in Spin simulation within 1000 steps. • Given last solution, random simulation failures vanish in first 3000 steps. • Verification fails after ~20000 steps • Cause of problem as yet unresolved
Verification • How to verify coherence generally?? • Verify something stronger: A processor will never see conflicting ordering of data if it always sees the newest data available in the system. • For all loads, assert that data is “new”
Modeling of Data • Concern that modeling data as random integer would cause Spin to run out of memory • Model data as a bit with values OLD and NEW. • All processor Stores store NEW data. • When transitioning to a Modified state, a cache will change all other values of data in memory and other caches to OLD • Global access to data here strictly a part of verification effort, not algorithm. Thus allowed.
Debugging • Found debugging parallel processes difficult. • Made much easier by Spin’s message sequence diagrams • Graphically shows sends and receives of all messages. • Requires use of Spin queues rather than globals for interprocess communication
Future work • Make existing protocol completely bug free • Activate additional “features” disabled for debugging purposes (e.g. bus transaction types) • Verify protocol specific rules • No two caches may be simultaneously Modified • Cache Modified or Exclusive --> no other cache is Shared