Lecture 3: State, Detection

Lecture 3:State, Detection Anish Arora CSE 763

The Stability Detection Problem • A stableproperty of a distributed system is one that persists: once a stable property is true it remains true thereafter • Examples: • “the computation has terminated” • “the system is deadlocked” • “all tokens in a token ring have disappeared” • Solution • Determine the global state of the system • Test the global state to see if the stable property holds

Termination Detection • Processes 0..N-1 arbitrarily connected by channels • Each process either idle or active • An active process can become idle spontaneously • An idle process can become active only upon receiving a message The Problem : Detect that all processes are idle and all channels are empty

Program and Proof (hand-in-hand) Design • Step 0 : How to count messages in channels. process j {send msg}  c.j := c.j + 1 ▯ {receive msg}  c.j := c.j - 1 Proof : Invariant I1  (Sum j :: c.j) = # of messages in channels

Refining the program • Step 1 : How to detect that all processes are idle. Consider a logical ring 0 -> … N-1 -> … 0 and pass a token Let t denote the location of the token process j {send msg}  c.j := c.j + 1 ▯ {receive msg}  c.j := c.j - 1 ▯ {propagate token}  t := t – 1  j  0  t = j  idle.j ; q := q + c.j ▯ {retransmit token}  t := N – 1  j = 0  t = j  idle.j ; q := 0   (q + c.0 = 0)

Refining the proof Proof : We begin with an idealized Invariant  I1  Q, where Q  (j : t<j  j<N : idle.j)  (q = (Sum j : t<j  j<N : c.j)) However Q is not preserved by one of the actions (the receive action for j, t < j  j < N) But when Q is violated, R becomes true, where R  q + (Sum j : 0 j  j  t : c.j) > 0 So, we weaken Invariant  I1  (Q  R) However R is not preserved by one of the actions (the receive action for j, 0  j and j  t)

Refining the program again • Step 2 : How to abort a detection when unsure that the token traversal was uninterrupted. process j {send msg}  c.j := c.j + 1 ▯ {receive msg}  c.j := c.j – 1; ; blacken j ▯{propagate token}  t := t – 1  j  0  t = j  idle.j ; q := q + c.j ; whiten j ▯{retransmit token}  t := N – 1  j = 0  t = j  idle.j ; q := 0 (q + c.0 = 0  0 is white) ; whiten j

Iterated refinement Proof : Invariant  I1  (Q  R  S) where S  (j:0  j  jt:j is black) However S is not preserved by one of the actions (the propagate action at a black node) So we introduce a color for the token and get the final program program of process j {send msg}  c.j := c.j + 1 ▯ {receive msg}  c.j := c.j – 1; ; blacken j ▯ {propagate token}  t := t – 1  j  0  t = j  idle.j ; q := q + c.j ; if black j then blacken token ; whiten j ▯ {retransmit token}  t := N – 1  j = 0  t = j  idle.j ; q := 0 (q + c.0 = 0  ; whiten token token is white  0 is white) ; whiten j

Termination Detection Predicate Termination  (j :: idle.j)  # of msgs sent - # of msgs received = 0 Invariant  (Sum j:: c.j) = # of msgs sent - # of msgs received  (Q  R  S  T) Q  (j : t<j  j<N : idle.j)  (q=( j : t<j  j<N : c.j)) R  q + ( j : 0 j  j  t : c.j) > 0 S  (j : 0  j  j  t : j is black) T  token is black

Proof of correctness • Invariant  t=0  O is white  idle.0  q+c.0=0  token is white  Termination • Invariant  Termination leads-to t = 0  0 is white  idle.0  q + c.0 = 0  token is white

Termination Detection Proof of (1): • O is white  t = 0   S • q + c.0 = 0  t = 0   R • token is white   T • Hence the antecedent implies Invariant  Q  q + c.0 = 0 i.e., the antecedent implies Termination Proof of (2): • If termination has occurred, only the propagation and retransmission actions can execute • After the first complete traversal of the ring by the token, all processes are white and the token is white • At the end of the next traversal, when t = 0, the algorithm detects the termination of the underlying computation

Lecture 3: State, Detection