300 likes | 756 Vues
Computer Science 328 Distributed Systems Lecture 6 Global States & Causality Detecting Global Properties Process Histories and States For a process P i , history(P i ) = h i = <e i 0 , e i 1 , … > prefix history(P i k ) = h i k = <e i 0 , e i 1 , …,e i k >
E N D
Computer Science 328Distributed Systems Lecture 6 Global States & Causality
Process Histories and States • For a process Pi, history(Pi) = hi = <ei0, ei1, … > prefix history(Pik) = hik = <ei0, ei1, …,eik > Sik : Pi ‘s state immediately before kth event • For a set of processes, global history:H = i (hi) global state:S = i (Siki) a cut C H = h1c1 h2c2 … hncn the frontier of C = {eici, i = 1,2, … n}
Consistent cut Inconsistent cut Consistent States • A cut C is consistent if e C(if f e then f C) • A global state S is consistent if it corresponds to a consistent cut e12 e10 e11 e13 P1 e21 P2 e22 e20 P3 e31 e30 e32
Global States • ARunis a total ordering of events in H that is consistent with each hi’s ordering • A Linearization is a run consistent with happens-before () relation in H. • Linearizations pass through consistent global states. • A global state Sk is reachable from global state Si, if there is a lineralization, L, that passes through Si and then through Sk. • A DS evolves as a series of transitions between global states S0 ,S1 , ….
Global State Predicates • Aglobal-state-predicateis a function from the set of global states to {true, false} , e.g. deadlock, termination • A stable global-state-predicateis one that once it becomes true, it remains true in subsequent global states, e.g. deadlock • if P is a gobal-state-predicate of being deadlocked, then safety S reachable from S0, P(S) = false • If P is a global-state predicate of reaching termination, then liveness(P) L lineralizatoins from S0 SL :passes through SL & P(SL) = true • We need a way to record global states
The “Snapshot” Algorithm • Records a set of process and channel states such that the combination is a consistent GS. • Assumptions: • No failure, all messages arrive intact, exactly once • Communication channels are unidirectional and FIFO-ordered • There is a comm. path between any two processes • Any process may initiate the snapshot (sends Marker) • Snapshot does not interfere with normal execution • Each process records its state and the state of its incoming channels (no central collection)
The “Snapshot” Algorithm (2) 1. Marker sending rule for process Pk • After Pk has recorded its state • for each outgoing channel C, send a marker on C 2. Marker receiving rule for a process Pk on receipt of a marker over channel C • ifPk has not yet recorded its state • record Pk’s state • record the state of C` as “empty” • turn on recording of messages over other incoming channels else • record the state of C` as all the messages received over C` since Pk saved its state
Chandy and Lamport’s ‘Snapshot’ Algorithm • Marker receiving rule for process pi • On pi’s receipt of a marker message over channel c: • if (pi has not yet recorded its state) it • records its process state now; • records the state of c as the empty set; • turns on recording of messages arriving over other incoming channels; • else • pi records the state of c as the set of messages it has received over c • since it saved its state. • end if • Marker sending rule for process pi • After pi has recorded its state, for each outgoing channel c: • pi sends one marker message over c • (before it sends any other message over c).
Execution of the Processes Send 5 widgets
e11,2 e14 e13 M M e24 M M e21,2,3 M M e31 e32,3,4 1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels C21 and C31 2- P2 receives Marker over C12, records its state (S2), sets state(C12) = {} sends Marker to P1 & P3; turns on recording for channel C32 3- P1 receives Marker over C21, sets state(C21) = {a} 4- P3 receives Marker over C13, records its state (S3), sets state(C13) = {} sends Marker to P1 & P2; turns on recording for channel C23 5- P2 receives Marker over C32, sets state(C32) = {b} 6- P3 receives Marker over C23, sets state(C23) = {} 7- P1 receives Marker over C31, sets state(C31) = {} Snapshot Example e10 e13 P1 a e23 P2 e20 b P3 e30
Reachability Between States in Snapshot Algorithm • Let ei and ej be events occurring at pi and pj, respectively such that ei ej • the snapshot algorithm ensures that • if ej is in the cut then ei is also in the cut. • if ej occurred before pj recorded its state, then ei must have occurred before • pi recorded its state.
Include(obj1) obj1.method() P2 has obj1 Causality Violation Physical Time 1 2 P1 0 1 P2 0 5 6 2 4 P3 0 4 3 • Causality violation occurs when order of messages causes an action based on information that another host has not yet received. • In designing a DS, potential for causality violation is important
Violation: (1,0,0) < (2,1,2) Detecting Causality Violation Physical Time 1,0,0 2,0,0 0,0,0 P1 (1,0,0) P2 0,0,0 2,1,2 2,2,2 (2,0,0) (2,0,2) P3 0,0,0 2,0,2 2,0,1 • Potential causality violation can be detected by vector timestamps. • If the vector timestamp of a message is less than the local vector timestamp, on arrival, there is a potential causality violation.
Communication Modes in DS • Unicast (best effort or reliable) • Messages are sent from exactly one host to one host • Best effort guarantees that if a message is delivered it would be intact • Reliable guarantees delivery of messages. • Broadcast • Messages are sent from exactly one host to all hosts on the same network. • Reliable broadcast protocols are not practical • Multicast • Messages are sent from exactly one host to several hosts on the same or different networks. • Multicast can be implemented above a reliable unicast
Basic Multicast • A basic multicast primitive guarantees a correct process will eventually deliver the message, as long as the multicaster does not crash. • A straightforward way to implement B-multicast is to use a reliable one-to-one send operation: • B-multicast(g,m): for each process p in g, send (p,m). • receive(m): B-deliver(m) at p.
Reliable Multicast • Integrity: A correct process pdelivers a message m at most once. • Validity: If a correct process multicasts message m, then it will eventually deliver m. • Guarantees liveness to the sender. • Agreement: If a correct process delivers message m, then all the other correct processes in group(m) will eventually deliver m. • Property of “all or nothing.” • Validity and agreement together ensure that overall liveness: if one process (the sender) eventually delivers a message m, then, since the correct processes agree on the set of messages they deliver, it follows that m will eventually be delivered to all the group’s correct members.
Ordered Multicast • FIFO ordering: If a correct process issues multicast(g,m) and then multicast(g,m’), then every correct process that delivers m’ will deliver m before m’. • Causal ordering: If multicast(g,m) multicast(g,m’) then any correct process that delivers m’ will deliver m before m’. • Totally ordering: If a correct process delivers message m before m’, then any other correct process that delivers m’ will deliver m before m’.
Total, FIFO and Causal Ordering • Totally ordered messages T1 and T2. • FIFO-related messages F1 and F2. • Causally related messages C1 and C3 • Total ordering does not imply causal ordering. • Causal ordering does not imply total ordering. • Hybrid mode: causal-total ordering, FIFO-total ordering.
os.interesting Bulletin board: From Subject Item 23 A.Hanlon Mach 24 G.Joseph Microkernels 25 A.Hanlon Re: Microkernels 26 T.L’Heureux RPC performance 27 M.Walker Re: Mach end Display From Bulletin Board Program
Ordering Guarantees (FIFO) • Process messages from each host in the order they were sent: • Each process keeps a sequence number for each host. • When a message is received, as expected, accept higher than expected, buffer in a queue lower than expected, reject If Message# is
Implementing FIFO Ordering • Spg: the number of messages p has sent to g. • Rqg: the sequence number of the latest group-g message p has delivered from q. • For p to FO-multicast m to g • p piggy backs the value Spg onto the message. • p B-multicasts m to g. • p increments Spg by 1. • Upon receipt of m from q with sequence number S: • p checks whether S=Rqg+1. If so, p FO-delivers m. • If S > Rqg+1, p places the message in the hold-back queue until the intervening messages have been delivered and S=Rqg+1.
Accept: 2 = 1 + 1 Accept 1 = 0 + 1 Reject: 1 < 1 + 1 2 0 0 Accept 1 = 0 + 1 2 0 0 Buffer 2 > 0 + 1 Accept Buffer 2 = 1 + 1 Example: FIFO Multicast Physical Time Reject: 1 < 1 + 1 2 0 0 2 1 0 1 0 0 2 1 0 P1 0 0 0 1 2 2 1 1 1 P2 0 0 0 2 1 0 2 0 0 1 0 0 1 P3 0 0 0 2 1 0 0 0 0 1 0 0 Accept: 1 = 0 + 1 0 0 0 Sequence Vector
ISIS algorithm for total ordering P 2 1 Message 3 2 P 2 4 2 Proposed Seq 1 3 Agreed Seq 1 2 P 1 3 P 3
Causal Ordering using vector timestamps The number of group-g messages from process j that happened before the current message to be multicast
0,0,0 1,1,0 1,0,0 Accept: Accept Buffered message Example: Causal Ordering Multicast Reject: Accept 1,0,0 1,1,0 1,1,0 P1 (1,1,0) (1,1,0) (1,0,0) P2 0,0,0 1,1,0 1,0,0 (1,0,0) (1,1,0) P3 0,0,0 1,1,0 Accept Buffer, missing P1(1) Physical Time