Reaching Consensus: Why it can’t be done

Reaching Consensus:Why it can’t be done For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. LynchMichael S. Paterson

Main Menu • The problem • Why the problem is unsolvable • If time allow: how to solve the problem with initial faulty processors

The Problem: • Consensus in the real world • Our mission • Model: • Objectives • Network • Possible faults

Consensus in the real world • There are many cases when we want that several processors agree on an action. • Usually, it is more important that all processors will agree on the same action then which action will be chosen. • For example, if we have a database, we will want that any transaction will be committed by all processors or by none of them.

Consensus in the real world-Cont. • Such agreement in fault free network is trivial. • For instance, we can choose a leader that tell all the other what to do. • However, real world processors are subject to failures • They might stop working (good case). • They might go haywire (bad case). • They might become malevolent (worse case).

Our mission • We will want to find an algorithm that, for any decision in every network, will choose a single action to perform. • However, we want that there will be at least two options, and that both of them can actually happen.

Our Model - objectives • We will work on a simplified problem, in which the processors only need to agree on a number that can be either 1 (commit) or 0 (discard). • Initially Each processor chooses is initial number randomly (simulate decisions based on the system condition). • 1 if can commit, 0 if can’t. • Each processor need to choose an action. After the action was chosen, it can’t be redone • In the end, all the processors need to agree on action, meaning they all choose 1 or 0

Our Model – objectives (cont.) • We will required that the algorithm could return both 1 and 0 (maybe for different cases). • So “always discard” or “always commit” is not a possible policy for our data base.

Our Model – Network • We will assume fully asynchronic network • If we send a message to a non-faulty processor, it will reach it after finite, unbounded time. • We will also assume the network is fully connected.For generality we will also assume full knowledge of direction • so any other topology can be simulated.

Why asynchronic?If processor work tick synchronic asynchronic P1 P1 M2 M2 P2 P2

Why asynchronic?But if one fail… tick synchronic asynchronic P1 P1 M2 M2 P2 is faulty! P2 P2

Our Model – Possible faults • We will assume that the processors can only stop working entirely. • We will also assume that only a single processor can malfunction in any given run. • However, we will assume that: • Other processors can’t tell that a processor stop working. • A processor can fail at any given time.

Our Model - more formally • N≥2 processors. • For each processor: • Input value Xp{0,1}, part of the problem input. • Output value yp{0,1,b}, initially b, can only change ones. • Infinite storage • Messages are of the form (p,m) where p is the target processor and m is the message. Any processor can send such message to any other processor. • We will assume that every message stay in a “messages buffer” between the time it was send and received. • Initially, the buffer is empty. • Goal: at the end, for each p1,p2: yp1 = yp2 ≠b

Our model – example, initial state Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=b 3 X3=1 Y3=b 4 X4=0 Y4=b

Our model – example, different state Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=0 2,m1 4,m2 4,m3 3 X3=1 Y3=0 4 X4=0 Y4=b 2,m2 2,m3

Our model – example, final state Messages buffer 1 X1=1 Y1=0 2 X2=0 Y2=0 2,m1 4,m2 2,m3 3 X3=1 Y3=0 4 X4=0 Y4=0

Why Consensus is impossible: • Intuition • Proof • Definitions • Lemma 1 • Lemma 2 • Lemma 3

Intuition • Let show the intuition for why this is an impossible task. • I will demonstrate on the problem of database consensus. • All the databases should have output value 1 if all workingdatabases have input value 1. • All the databases should have output value 0 if at least one workingdatabase have input value 0. • In this case, working mean not failing at the beginning of the algorithm.

Initial state • We will choose an initial state where both results are possible. • In our case, if processor 1 failed during the algorithm, the result might be 1. • Otherwise, the result should be 0. 1 X1=0 Y1=b 2 X2=1 Y2=b 3 X3=1 Y3=b 4 X4=1 Y4=b

case 1: • If 1 sent is first message: • All processors know that it can’t commit . • The algorithm should decide 0. 1 X1=0 Y1=b 2 X2=1 Y2=b I failed to commit 0 3 X3=1 Y3=b 4 X4=1 Y4=b

case 2: • If 1 failed before sending this message, the algorithm should decide without him. • Since all other processor can commit, the algorithm should decide 1. 1 X1=0 Y1=b 2 X2=1 Y2=b 1 3 X3=1 Y3=b 4 X4=1 Y4=b

Quasi failure Z • Let say that a processor “quasi failed” if: • It may be alive or dead. • If he is alive, he will execute its next step after the algorithm “finished” without him. Z 1 X1=0 Y1=b

Quasi failure - Intuition Processor Schrödinger's cat 1 X1=0 Y1=b 1 X1=0 Y1=b

Quasi failure – our example Z • If 1 quasi failed: • The algorithm have 3 choices: Z 1 X1=0 Y1=b 2 X2=1 Y2=b 3 X3=1 Y3=b 4 X4=1 Y4=b

Quasi failure choices (1/3) Z • Decide 0. • In this case, if processor one actually failed: • The result will be wrong! Z 1 X1=0 Y1=b 2 X2=1 Y2=b 0 3 X3=1 Y3=b 4 X4=1 Y4=b

Quasi failure choices (2/3) Z • Decide 1. • In this case, if the processor wake up: • The result will be wrong! Z 1 X1=0 Y1=b 2 X2=1 Y2=b 1 3 X3=1 Y3=b 4 X4=1 Y4=b

Quasi failure choices(3/3) Z • Not deciding. • In this case, if the processor actually failed: • The algorithm will never decide. Z 1 X1=0 Y1=b 2 X2=1 Y2=b ? 3 X3=1 Y3=b 4 X4=1 Y4=b

Intuition – summary • There is an initial state where both answers are possible (Lemma 2). • There is an event in a specific processor (in our case, processor 1 starts working and sending its message) that is occurrence, No matter when(Lemma 1),determine the outcome. • If a processor quasi-fail, we can’t decide (because the answer depend on whether he actually fail, and we can’t know that). • If we will not decide, then we will reach another one of those state (Lemma 3) and be stuck forever.

Intuition – summary(cont.) • Remember that in the example, we forced them to agree according to some policy. In the real problem (and in the following proof) we just need them to agree on the same value, no matter which.

Proof – definitions (1/6) • Configuration: the combination of the internal state (input, output, memory) for each processor and the messages in the buffer. • Step: an action of on processor. For processor p, consists of: • Try receiving a message (removing it from the messages buffer). If succeed, receive (p,m). If failed, receive (p,). • Conduct computation. May send any finite amount of messages

Configuration and step Step 1 2,m1 Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=b Y2=1 2,m1 2,m1 Step 2 3 4

Proof – definitions (2/6) • Event e=(p,m): the receiving of message m by p • Since our processors are deterministic, the change of the configuration by step is depend only on the received message. • The event e=(p,) is always possible for any p. • e(C): the configuration reached from C by the event e. • Schedule: a finite or infinite sequence σ of events. • σ(C): The final configuration from initial configuration C

Event and sequences (1,) (2,m1) 2,m1 Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=b Y2=1 2,m1 2,m1 3 4 σ =((1,),(2,m1))

Proof – definitions (3/6) • Reachable: configuration C is reachable from C’ if schedule σ exists so: σ(C’) = C • Accessible configuration: Configuration C is accessible if exists an initial configuration C’ so C is reachable from C’. • DV(C): The set {v|v≠b and p:v=yp}, or the values that were chosen by some processor. • A protocol is partially correct if: • If configuration C is accessible, |DV(C)|≤1 • Two accessible configurations C,C’ exists so: DV(C)={0}, DV(C’)={1}

Partially correctness 2,m1 Messages buffer 1 X1=1 Y1=b Y1=0 2 X2=0 Y2=b Y2=1 2,m1 2,m1 3 4 DV(C)={} DV(C)={0} DV(C)={0,1}

Proof – definitions (4/6) • Nonfaulty: processor is nonfaulty if it take infinite number of steps. • Faulty: a Non-Nonfaulty processor (stop taking step after some time). • Admissible: a run is admissible if it contain at most one faulty processor and the messages buffer is fair. • Deciding: a run is deciding if eventually for some processor p, yp≠b • A protocol P is totally correct in spite of one fault if: • P is partially correct. • Every Admissible run in P is deciding run

Main Theorem • No consensus protocol is totally correct in spite of one fault • We will assume the contrary: assume protocol P’ is totally correct in spite of one fault

Lemma 1 • For any two disjoint finite schedule σ1,σ2 and initial configuration C exists: σ1(σ2(C)) = σ2(σ1(C)) • Disjoint: involving different processors. • Proof: • From the system definition, since σ1,σ2 don’t interact.

Lemma 1 – visually 2,m1 1,m2 1,m3 Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=b Y2=1 2,m1 2,m1 Sequence 1 Sequence 2 1,m2 4,m4 4,m5 1,m3 3 4 4,m4 4,m5

Lemma 1 – visually (opposite order) 2,m1 1,m2 1,m3 Messages buffer 1 X1=1 Y1=b 2 X2=0 Y2=b Y2=1 2,m1 2,m1 Sequence 1 Sequence 2 1,m2 4,m4 4,m5 1,m3 3 4 4,m4 4,m5

Lemma 1 – visually Normal order: Opposite order:

Proof – definitions (5/6) • Let FDV(C) be the union of DV(C’) for each C’ reachable from C. • If FDV(C) = {0,1}, C is bivalent. • If |FDV(C)|=1, C is univalent. • If FDV(C) = {0}, C is 0-valent. • If FDV(C) = {1}, C is 1-valent. • P’ is totally correct, so FDV(C) ≠. • Intuitively, FDV(C) the possible decisions from configuration C.

Lemma 2 • Lemma: There is a bivalent initial configuration.

Lemma 2 – Proof (1/3) • Assume otherwise: • From partial correctness, P’ have both 0-valent and 1-valent initial configurations. • Let call two initial configurations adjacent if they differ only by a single processor input value. • Any two initial configurations can be joined by a chain of adjacent configuration. • Hence, there are two adjacent 0-valent and 1-valent initial configurations. explanation

Lemma 2 – Proof (2/3) • Remainder 1: there are two adjacent 0-valent and 1-valent initial configurations. • Let call them C0, C1 accordingly. • C0, C1 are adjacent, so there is only one processor, p, that has different input value between them. • Remainder 2: P’ is totally correct in spite of one fault. • So P’ should reach a decision even if a processor fail.

Lemma 2 – Proof (3/3) • Let R be an admissible run from C0 where p fail. From totally correctness in spite of one fault, R must reach a deciding run. Let σ be the corresponding schedule. • If 1DV(σ(C0)) , then 1FDV(C0), but C0 is0-valent. So 1DV(σ(C0)), therefore DV(σ(C0))={0} • However, since the only different between C0, C1 is p and p fail, σ is legal on C1 and σ(C0)σ(C1) (equal except p, which fail and therefore didn’t decide) and so DV(σ(C0))=DV(σ(C1)) ={0}, 0FDV(σ(C1)), but C1 is 1-valent.

Proof – definitions (6/6) • For any configuration C and event e=(p,m) so e(C) is legal, Let Rne(C) be the set of all configuration reachable from C without applying e. • Note that e can be applied on any C’Rne(C) • Let eR(C) be {e(C’)| C’Rne(C)} • Let two configuration, C,C’ be called neighbors if one is reachable from the other in a single step. • Equivalent to saying that an event e exists such that C’=e(C) or C=e(C’)

Lemma 3 • If C is bivalent then for each e=(p,m), eR(C) contain bivalent state.

Lemma 3 – Proof (1/7) • Let assume that every DeR(C) is univalent. • C is bivalent, and therefore, for any i{0,1} exists a i-valent configuration Ei that is reachable from C. Let σibe a schedule that fulfill Ei=σi(C). • let the configuration Fi be: • If eσi, Fi=e(Ei) • If eσi, then σi=σi‘(e(σi‘’)). Fi=e(σi‘’(C)) • In both cases, FieR(C), and therefore Fiis i-valent • Since either Fi is reachable from Ei or vice-versa.

Lemma 3 – Proof (2/7) • So, eR(C) contain both 0-valent and 1-valent configuration. • By easy induction on the length of the schedule to Fi (when e(C) is j-valent for j≠i) there exists two neighbors C0, C1 so Di =e(Ci) is i-valent for i{0,1}. • Without loss of generality, assume C1=e’(C0)

Reaching Consensus: Why it can’t be done