Consistency
230 likes | 480 Vues
Consistency. Models of computation. Coherence vs. consistency. coherence deals with accesses to the same memory location consistency addresses the possible outcomes from legal orderings to all memory locations
Consistency
E N D
Presentation Transcript
Consistency Models of computation
Coherence vs. consistency • coherence deals with accesses to the same memory location • consistency addresses the possible outcomes from legal orderings to all memory locations • common model (sequential consistency) is easy to understand but is difficult to implement, and has poor performance
What do you expect? • Sequential consistency: “Commit results in processor order” • simple enough in a uniprocessor • similarly with context switching: just save and restore state • what about multi-threading, or multiprocessor machines?
MIPS R10000 • issue instructions out of order • in-order commit • speculative loads may execute and pass a value for modification long before the load commits in program order • meantime, some other processor may commit a store to that location
Producer - consumer P1 P2 write (A) ; while(flag != 1) ; flag := 1 ; read (A); • assumes P1’s writes become visible to P2 in program order
One or both proceed P1 P2 X := 0 ; Y := 0 ; ... ... if (Y == 0) kill P2; if (X == 0) kill p1 ; • it’s a race through the critical section
Sequential consistency • results can be mapped to some sequential execution where the instructions of each process appear in that program order equivalently: • memory operations proceed in program order • all writes are atomic and become visible to all processors at the same time
The need to relax • strict sequential consistency has severe performance drawbacks, so: • keep sequential consistency, and use prefetch and speculation, or • relax the consistency model – and be prepared to think carefully about programs
Attributes of consistency models • system specification • which orders are preserved, and which are not? is there system support to enforce a particular order? • programmer interface: the set of rules that will lead to the expected execution • translation mechanism: how to translate program annotations to hardware actions
Alternative 1 • total store ordering: allows a read to bypass an earlier incomplete write • helps hide write latency • can be provided by fence instructions • SPARC v9 provides various memory barrier instructions
Alternative 2 • partial store ordering: allow writes as well as reads to bypass writes • writes cannot bypass reads • writes are still atomic • very different from sequential consistency • e.g. spinning on a flag doesn’t work • needs a store barrier instruction to emulate sequential consistency
Alternative 3 • processor consistency: same as total store ordering, but does not guarantee atomic writes • implemented in recent Intel processors
Weak ordering • just try to preserve data and control dependencies within a process • don’t worry about the order of memory operations between synchronization points • e.g. don’t worry about the exact order of independent reads and writes within a critical section
Weak ordering • code from outside (before or after) a critical section cannot be reordered with code inside it • code before a barrier must commit before entering, code after a barrier must not issue until the barrier is left • code before a flag wait must commit before waiting, and code after must not issue before flag is set by the producer • code before setting of a flag must commit first, and code after must not issue before the flag is set
Weak ordering • a good match to modern CPUs and aggressive compiler optimizations • hardware must recognize synchronization, or compiler must insert proper barriers • MIPR R10000 provides sync instruction and fence count register • sync disables issue until fence register is zero and all outstanding memory operations have committed • fence count incremented on an L2 miss and decremented on a reply
Release consistency • relax weak ordering further • categorize all synchronization operations as either acquire or release • acquire is a read (load) on a protected variable, like a lock or a waiting on a flag • release is a write (store) granting access to others, like unlock or setting a flag • barrier is release (arrival) and acquire (departure)
In practice • MIPS processors are sequentially consistent • Sun supports total or partial store ordering • Intel supports processor consistency • Alpha and PowerPC support weak ordering; Power4 and Power5 do not guarantee atomic writes
Processor consistency • a simple model with good performance • writes must become visible to all processors in program order • loads can bypass writes
Back to our examples Under these rules, • does producer-consumer work? • does one-or-both work?
Results under processor consistency • producer-consumer is okay because P1’s actions are both writes and they must become visible sequentially • one-or-both can break because loads can bypass writes • if (X == 0) is a load • Y = 0 is a write
Intel Itanium • loads are not reordered with other loads • stores are not reordered with other stores • stores are not reordered with older loads • stores to the same location have a total order • a load may be reordered with an older store to a different location
Itanium example 1 • initially, x=y=0 P1 P2 R1 <- x R2 <- y (loads) y <- 1 x <- 1 (stores) • we will never see R1 = R2 = 1 because stores are not reordered with older loads
Itanium example 2 • initially, x=y=0 • P1 P2 • x <- 1 y <- 1 (stores) • R1 <- y R2 <- x (loads) • we may see R1 = R2 = 0 because loads may be reordered with older stores