Distributed Algorithms

Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks

Consensus Problems • Consensus problems very important in DS • Distributed Databases • All processes must agree whether to commit or abort a transaction • If any process says abort, all processes should abort • Atomic Broadcast • All processes receive the same set of messages coming from correct processes only • Can be used to implement consensus, vice versa

Fischer, Lynch, Paterson 1983/85 • Consensus cannot be solved in asynchronous model • With possibility of one process crashing • http://www.sics.se/~ali/flp85.pdf • Most influential paper award PODC 2001

Modified Model • To proof the result, we will modify our model of a distributed system slightly • Processes execute local algorithms, modeled by a STS • But, given any state, a correct process can always execute a “dummy” instruction • For any state  in a process, there exists a transition  • There exists always an applicable event on every process • A crashed process, cannot make any transitions

Definition: T-crash fair executions • A t-crash-robust algorithm is a consensus algorithm if it satisfies: • Termination • All correct processes eventually decides • Agreement • In every configuration, the decided processes should have decided for the same value (0 or 1) • Non-triviality • There exists at least one possible input configuration where the decision is 0 • There exists at least one possible input configuration where the decision is 1 • Example, maybe input “0,0,1”->0 while “0,1,1”->1

Definitions • 0-decided configuration • A configuration with decide ”0” on some process • 1-decided configuration • A configuration with decide ”1” on some process • 0-valent configuration • A configuration in which every reachable decided configuration is a 0-decide • 1-valent configuration • A configuration in which every reachable decided configuration is a 1-decide • Bivalent configuration • A configuration which can reach a 0-decided and 1-decided configuration

Definitions Illustrated 1(4) • 0-decided configuration • A configuration with decide ”0” on some process 0-decided configuration {STATE2, STATE,5 DECIDE-0, STATE7 {msg1, msg2} } P1 state2 msg2 At least of them is in state DECIDE-0 P2 state5 msg1 P3 decide0 P4 state7

Definitions Illustrated 2(4) 0-valent configuration {decide-0, P2_state2, P3_state2, decide-0, { msg2} } • 0-valent configuration • No 1-decided configurations are reachable • Future determined, means ”everyone will decide 0” 0-valent configuration {decide-0, P2_state2, P3_state2, P4_state, {msg1, msg2} } 0-valent configuration {decide-0, P2_state, decide-0, P4_state, {msg1, msg2} } 0-valent configuration { P1_state, P2_state2, P3_state, P4_state, {msg1} } 0- valent configuration { P1_state, P2_state, P3_state, P4_state, {msg1} } 0-valent configuration {decide-0, P2_state, P3_state, P4_state, {msg1, msg2} } 0-valent configuration {decide-0, P2_state3, P3_state, decide-0, {} } 0-valent configuration {decide-0, P2_state, P3_state, decide-0, { msg2} }

Definitions Illustrated 3(4) 0-valent configuration {decide-1, P2_state2, P3_state2, decide-1, { msg2} } • 1-valent configuration • No 0-decided configurations are reachable • Future determined, means ”everyone will decide 1” 0-valent configuration {decide-1, P2_state2, P3_state2, P4_state, {msg1, msg2} } 0-valent configuration {decide-1, P2_state, decide-1, P4_state, {msg1, msg2} } 0-valent configuration { P1_state, P2_state2, P3_state, P4_state, {msg1} } 0- valent configuration { P1_state, P2_state, P3_state, P4_state, {msg1} } 0-valent configuration {decide-1, P2_state, P3_state, P4_state, {msg1, msg2} } 0-valent configuration {decide-1, P2_state3, P3_state, decide-1, {} } 0-valent configuration {decide-1, P2_state, P3_state, decide-1, { msg2} }

Definitions Illustrated 4(4) 0-valent configuration {decide-0, P2_state2, P3_state2, decide-0, { msg2} } • Bivalent configuration • Both 0 and 1-decided configurations are reachable • Future undetermined, could go either way… 0-valent configuration {decide-0, P2_state2, P3_state2, P4_state, {msg1, msg2} } 0-valent configuration {decide-0, P2_state, decide-0, P4_state, {msg1, msg2} } 0-valent configuration { P1_state, P2_state2, P3_state, P4_state, {msg1} } bivalent configuration { P1_state, P2_state, P3_state, P4_state, {msg1} } 1-valent configuration {decide-1, P2_state5, P3_state6, P4_state5, {msg1, msg3} } 1-valent configuration {decide-1, P2_state9, P3_state6, decide-1, {} } 1-valent configuration {decide-1, P2_state5, P3_state6, decide-1, { msg2} }

Bivalent Initial Configuration • Theorem • For any algorithm that solves the 1-crash consensus problem there exists an initial bivalent configuration

Proof 1/(10) • We know that the algorithm must be non-trivial • There should be some initial configuration that will lead to a 0-decide • There should be some initial configuration that will lead to a 1-decide • Take two such configuration i1 and i2 • E.g. 4 processes • initial values (0,1,0,1,1) lead to 1 • Initial values (0,0,1,0,0) lead to 0

Proof 2/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs transforming the upper input to the lower input

Proof 3/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to ? • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs transforming the upper input to the lower input

Proof 4/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to ? • (0,0,1,1,1) leading to ? • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs transforming the upper input to the lower input

Proof 5/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to ? • (0,0,1,1,1) leading to ? • (0,0,1,0,1) leading to ? • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs transforming the upper input to the lower input

Proof 6/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to ? • (0,0,1,1,1) leading to ? • (0,0,1,0,1) leading to ? • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs transforming the upper input to the lower input There must exist two neighboring configurations here, with two different outcomes

Proof 7/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to 1 • (0,0,1,1,1) leading to 1 • (0,0,1,0,1) leading to 0 • (0,0,1,0,0) leading to 0 Lets look at other initial configurations by flipping the inputs Assume the following two

Proof 8/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,1,0,1,1) leading to 1 • (0,0,0,1,1) leading to 1 • (0,0,1,1,1) leading to 1 • (0,0,1,0,1) leading to 0 • (0,0,1,0,0) leading to 0 Assume the following two Identical configurations except for process p4

Proof 9/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,0,1,1,1) leading to 1 • (0,0,1,0,1) leading to 0 • The consensus algorithm should tolerate if p4crashes! • (0,0,1,X,1), leads to ? (either 0 or 1) Assume the following two

Proof 10/(10) • We know there exists inputs p1, p2, p3, p4, p5 • (0,0,1,1,1) leading to 1 • (0,0,1,0,1) leading to 0 • The consensus algorithm should tolerate if p4crashes! • (0,0,1,X,1), leads to ? (either 0 or 1) • If it leads to 1, then depending on whether p4 crashes or not (0,0,1,0,1) either leads to 0 or 1 (bivalent) • If it leads to 0, then depending on whether p4 crashes or not (0,0,1,1,1) either leads to 0 or 1 (bivalent) Assume the following two

Initial Bivalence • Intuition • Given any algorithm, we can find some start state, that depending on the failure of one process, will either lead to a 0-decide or a 1-decide 1-valent configuration {decide-1, P2_state2, P3_state2, P4_state, {msg1, msg2} } 1-valent configuration { P1_state, P2_state, decide-1, P4_state, {msg1, msg2} } 1-valent configuration { P1_state, P2_state2, P3_state, P4_state, {msg1} } Bivalent Initial Config { P1_state, P2_state, P3_state, P4_state, {msg1} } 0-valent configuration { P1_state, P2_state, P3_state, P4_state, {msg1, msg2} } 0-valent configuration {decide-0, decide-0, P3_state, decide-0, {} } 0-valent configuration {decide-0, P2_state, P3_state, P4_state, { msg2} }

Coarse-grained Model of Distributed Systems • In our model, we will now let each event be the receipt of a message • After the receipt of a message m, a process deterministically makes all internal and send events it can do • In other words, we make our course-grained model a bit more fine-grained • An event represents the receipt of a message, some internal transitions and the sending of some messages • A receipt of message m at process p is always applicable if a message m with destination p is in the network

Intuition behind model Initial state of p receive <tok, y> from q for x:=1 to 3 do begin y:=y+1; send <tok, y> neighp[x]; end receive <tok, z> from q; print z+y Receipt event e Deterministic transitions State of p after receipt of e Receipt event f Deterministic transitions State of p after receipt of f

Order of events • Intuition • The order in which two applicable events are executed is not important! • Order Theorem • Let ep and eq be two events on two different processors p and q which are both applicable in configuration . Then ep can be applied to eq(), and eq can be applied to ep(). • Moreover, ep(eq()) = eq(ep() ).

Definitions • A sequence of events =( e1, e2,…,ek) is applicable in configuration  if • e1 is applicable in , • e2 applicable in e1() • ... • If the resulting configuration is  we write ()= or  • If  only contains events of a subset of the processes P, we write P

Order of sequences • Diamond Theorem • Let sequences 1 and 2 be applicable in configuration , and let no process participate in both 1 and 2. Then 2 is applicable in 1(), 2 is applicable in 2(), and 1(2())=2(1()) • Proof • By induction using the order theorem

Illustration of the Diamond Theorem  1 2 2() 1() 2 1   =2(1())=1(2())

Bivalent Configuration • Any configuration of the 1-robust consensus algorithm is exactly one of these three • Bivalent • 0-valent • 1-valent • Why? • Any configuration leads to a decide because of termination • We know bivalent configurations exist • If it is not bivalent, it must lead to either 0-decide or 1-decide, so it is either 0-valent or 1-valent

Bivalent Configurations • In any bivalent config , either • one applicable event goes to a bivalent config, or • there exists two applicable events, leading to a 0-valent and 1-valent configurations (respectively) Case 1 Case 2 0-valent Bivalent Bivalent Bivalent 1-valent

Staying Bivalent • Theorem • Given any bivalent config  and an event e applicable in  • There exists another reachable config  where e is applicable, and e() is bivalent Theorem Illustration    e e Bivalent … … e Bivalent Bivalent … …

Proof definitions • Assume e involves process p • Call the set of all possible configs reachable from  without applyinge the set C • Apply eventeto all configs inCand call the resulting configsD Theorem Illustration C … … e … … e e … … … Bivalent e … e … … D e … …

Proof intuition • We will proof that D contains a bivalent config by contradiction • I.e., assume there exists no bivalent config in D, show that this will lead to a contradiction or absurdity Theorem Illustration C … … e … … e e … … … Bivalent e … e … … D e … …

Proof • Assume D contains no bivalent configs • I.e. all configs in D are either 0-valent or 1-valent • Then it follows that there exists a 0-valent and a 1-valent config in D (next slides)

Proof • We know we can reach a 0-valent and 1-valent config from , call them 1and 2 (non-triviality) • Either 1 and 2 are in C or they are not in C • If inside C, then e(1) and e(2) is in D and they are 0-valent/1-valent 1 and 2 are in C 1 and 2 are not in C C C 1 … … … e e … … … … e e e e 1 … … … … Bivalent Bivalent e e … … e e 2 … e e … … 2 … …

Proof • If not inside C, then some1 and 2 exists on the path to 1and 2, such that e(1) ande(2) are inDand they are 0-valent/1-valent • [Remember we assumed no bivalent config available inD] 1 and 2 are in C 1 and 2 are not in C C C 1 … … … e e … … … … e e e e 1 1 … … … … Bivalent Bivalent e e … … 2 e e 2 … e e … … 2 … …

Reflection • We now know that D must always contain a 0-valent and 1-valent config, assuming no bivalent config exists in D • Lets call the two 0-valent and 1-valent configs in D, d0 and d1 • We will now show that this situation is a contradiction itself. Hence, D must contain a bivalent config

Deriving the contradiction • There must exist two configs c0 and c1 in C such that c1=f(c0), and d0=e(c0) and d1=e(c1) f C c0 c1 e e d0 d1 D • Lets see why!

Proofing two neighbors exist 1(4) • We know  is bivalent, and e() is in D and is either 0-valent or 1-valent, assume 0-valent C  e 0-valent D

Proofing two neighbors exist 2(4) • We know  is bivalent, and e() is in D and is either 0-valent or 1-valent, assume 0-valent • There is a reachable 1-valent config in D f0 2 … m C  1 e e 1-valent 0-valent D

Proofing two neighbors exist 3(4) • We know  is bivalent, and e() is in D and is either 0-valent or 1-valent, assume 0-valent • There is a reachable 1-valent config in D • e is applicable in each i, and must be 0-valent or 1-valent f0 2 … m C  1 e e e e e x-valent z-valent 1-valent y-valent 0-valent D

Proofing two neighbors exist 4(4) • We know  is bivalent, and e() is in D and is either 0-valent or 1-valent, assume 0-valent • There is a reachable 1-valent config in D • e is applicable in each i, and must be 0-valent or 1-valent f0 f1 f3 f2 2 … m C  1 e e e e e There exists two neighbors, one 1-valent and one 0-valent 0-valent z-valent 1-valent 1-valent 0-valent D

Proofing two neighbors exist 4(4) • We know  is bivalent, and e() is in D and is either 0-valent or 1-valent, assume 0-valent • There is a reachable 1-valent config in D • e is applicable in each i, and is 0/1-valent f 2 C 1 e e There exists two neighbors, one 1-valent and one 0-valent 0-valent 1-valent D

Neighbors lead to contradiction 1(3) • We now know there exist two configs c0 and c1 in C such that c1=f(c0), and d0=e(c0) and d1=e(c1) • Either the events e and f happen on the same processor or on different processors, both cases will lead to contradictions f 2 C 1 e e There exists two neighbors, one 1-valent and one 0-valent 0-valent 1-valent D

f Neighbors lead to contradiction 2(3) • We now know there exist two configs c0 and c1 in C such that c1=f(c0), and d0=e(c0) and d1=e(c1) • Assume e and f happen on two different processes p and q • Then, the order of their execution can be exchanged f C c0 c1 e e d0 d1 0-valent 1-valent D Contradiction as d0 is 0-valent, but it can lead to a 1-valent config, hence d0 must be bivalent, but we assumed no bivalent configs exist in D

If p is silent, the algorithm should continue and terminate with a decision in some config A If p is silent, some execution leading to 1should exist If p is silent, some execution leading to 0should exist f e e A 2 0 1 0-valent 1-valent Neighbors lead to contradiction 3(3) • We now know there exist two configs c0 and c1 in C such that c1=f(c0), and d0=e(c0) and d1=e(c1) • Assume e and f happen on the same process p, the algorithm should still work if p is silent C f e e c0 c1 d0 d1 Contradiction as A should be a 0/1-valent configuration, but we have shown that A can lead to both 0and 1

Proof Map Assume there is no bivalent config in D • We know all configs in D are 0-valent or 1-valent • Show that we can find a 0-valent and 1-valent config in D • Show that two neighboring configs c0─e→c1 exist, where c0 ─f→”0-valent config”, c1 ─f→”1-valent config” • Show this is a contradiction Assumption must be incorrect D must contain a bivalent configuration

Final Theorem • No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model • Proof • Start in a initial bivalent config • Given the bivalent config, pick the event e that has been applicable longest • Pick the execution taking us to another config where e is applicable • Apply e, and get a bivalent config • Repeat 2.

Consensus not Impossible! • Lets do deterministic consensus algorithm for the a different failure model • Initially dead processes • Assume t failures can happen initially • Where t=4 for N=10, t=5 for N=11 • Let Ldenote L=6 for N=10, L=6 for N=11 N=t+L

Intuition • Assume N processes are connected in a underlying graph, and at most t fail • We know L processes are alive after the start • Broadcast your identity, and receive/collect L identities • For any two correct processes, their set of collected identities will overlap • Quorom concept • There are N nodes, any two processes have L identities each, i.e. total • N+1 identities, total N nodes, at least two must be same (PHP)

Distributed Algorithms – 2g1513