CS 294-8 Consensus Revisited http://www.cs.berkeley.edu/~yelick/294

CS 294-8Consensus Revisitedhttp://www.cs.berkeley.edu/~yelick/294

Agenda • Consensus overview • Classic impossibility proof by FLP: • Impossibility of consensus in shared memory with n-1 failures • Impossibility of consensus in shared memory with 1 failure • Impossibility of consensus with message passing • What does this mean in practice? • Administrivia

Models • Failures: • Link failures • Processor crash failures • Byzantine processor failures • Timing • Synchronous: lock step algorithms • Asynchronous: unbounded delay • Partially synchronous: bounds on message delay or processor speed differences

The Consensus Problem • In general, the consensus problem is to get all non-faulty processors to agree on something: • To commit a transaction • Which processors are “up” • Which version of a file to use • Abstract problem: Every processor has an input • Termination: Eventually every non-faulty processor must decide on a value. • Agreement: All non-faulty decisions must be the same. • Validity: If all inputs are the same, then the non-faulty decision must be that input.

Impossibility of Asynchronous Consensus Proof outline: • Show impossible in shared memory with n-1 faults. (Wait-free consensus) • 1 implies there is no 2-proc algorithm resilient to 1 fault • Show impossible in shared memory with 1 fault by reduction • Show impossible in message passing systems by reduction. Original result by Fischer/Lynch/Paterson. This proof presentation due to Welch.

Step 1: Impossibility of Wait-Free Consensus • An algorithm for n processor is wait-free if it can tolerate n-1 crashed processors • Theorem 1: There is no wait-free consensus algorithm in an asynchronous shared memory system. • Proof plan: By contradiction. Classify configurations C according to how many different decisions are reachable: • Bivalent: both 0 and 1 are reachable • Univalent: only one output is reachable • (0-valent or 1-valent) Three lemmas lead to the result C 0 0 1 0 1 1

Impossibility of Wait-Free Consensus (con’t) • Lemma 1: There is an initial configuration that is bivalent. • Proof: Assume all initial configurations are univalent. Build a chain of configurations: But if a and a’ differ only in 1 input, processor i. Consider executions in which i fails immediately – since a produces 0, so does a’, a contradiction. xx0xx xx1xx 11111 00000 … a0 a a’ a1 … 0-valent 0-valent 1-valent 1-valent

Impossibility of Wait-Free Consensus (con’t) • Lemma 2: If C1 and C2 are univalent and C1 and C2 are equivalent at pi then C1 and C2 have the same valency. • Proof: Suppose C1 is v-valent. • Since the algorithm is wait-free (i.e., all other processors could stop), there is a schedule s in which only pi takes steps that causes pi to decide v • Since pi cannot tell the difference between C1 and C2, if s is applied to C2, pi also decides v there. • Thus C2 is also v-valent

Impossibility of Wait-Free Consensus (con’t) • Lemma 3: If C is bivalent, then at least one processor is not critical, i.e., it can take a step and keep the system bivalent. • Proof: By cases: • Suppose in contradiction that all processors are critical. Then there exist processors pi pj: 0/1 C pj pi 0 1

Impossibility of Wait-Free Consensus (con’t) • Case 1: pi and pj access different registers or read the same register • But these operations commute => a contradiction. 0/1 C pj pi 0 1 ??

Impossibility of Wait-Free Consensus (con’t) • Case 2: pi writes to and pj reads from the same register • Let C+i be the configuration after executing pi and C+j+i be the configuration after executing pj then pi. C+i is equivalent to C+j+i from pi’s perspective, contradicting Lemma 1. 0/1 C pj reads from R pi writes to R 0 1 pi writes to R ??

Impossibility of Wait-Free Consensus (con’t) • Case 3: pi and pj write to the same register • As in case 2, we can “run” the completion of the left-hand execution after pj’s write. Since pi overwrites R, the executions result in 0. 0/1 C pj writes to R pi writes to R 0 1 pi writes to R ??

Impossibility of Wait-Free Consensus (con’t) • Theorem 1: There is no wait-free consensus algorithm in an asynchronous shared memory system. • Proof: Construct an execution in which all configurations are bivalent. • Start with bivalent initial configuration from lemma 1. • Use lemma 2 to get net bivalent configuration • Repeat step 2 infinitely

Impossibility of Single Failure Consensus Even if the ratio of faulty processors is very low, consensus cannot be solved in asynchronous shared memory Proof outline: • Assume there exists an algorithm A for n processors and 1 failure • Use A as a subroutine to design algorithm for A’ for 2 processors and 1 failure • Previous result shows A’ cannot exist • Thus A does not exist

Impossibility of Single Failure Consensus (con’t) Proof assumptions: for processors q0,…qn-1 • Each qi has a single register Ri which it writes and others read • Code of each qi alternates reads and writes, beginning with a read • Each write step of each qi write qi’s entire current state into Ri All of these are without loss of generality.

Impossibility of Single Failure Consensus (con’t) Idea of algorithm A’ for p0 and p1: • Each pi goes through the qj’s in round-robin order, trying to simulate their steps. Steps are grouped into pairs: a read and the following write. • When pi begins the simulation of qj, it uses its own input as the input for qj. If pi ever simulates a decision step by qj, it decides the same thing. • How do p0 and p1 keep their simulations consistent? The need to “agree” on the value of each qj’s local state after each pair of steps by qj.

Impossibility of Single Failure Consensus (con’t) For qj’s kth pair, p0 and p1 each have flag variable: • Assume qj’s k-1st pair has been computed. • pi calculates its suggestion for qj’s state after the kth pair (see later slides) • pi checks if pi-1 has made a suggestion for this state of qj • If not then pi sets its flag to 1 • If so, then pi sets its flag to 0

Write suggest0 • Read suggest1 • Write flag0 • 1 if suggest1 empty • 0 otherwise • Write suggest1 • Read suggest1 • Write flag1 Impossibility of Single Failure Consensus (con’t) Note order of operations: So two 0 flags is possible, but not two 1’s.

Impossibility of Single Failure Consensus (con’t) Interpretation of flags: • If pi’s flag is 1, then pi is the winner. • If both are 0, then consider p0 the winner. • If one is 0 and the other is not yet set, the winner is not yet determined. • If neither is set, the winner is not yet determined. • Not possible for both to be 1. In cases 1 and 2, the kth pair is said to be computed; otherwise not.

Impossibility of Single Failure Consensus (con’t) How does pi calculate suggestion for qj’s state after qj’s kth pair? • pi gets qj’s state after its k-1st pair: • if k-1 = 0, then user qj’s initial state with pi’s input • Otherwise get the suggestion of the winner for qj’s k-1st pair. • Consult qj’s state (just obtained) to determine which qr’s register is to be read in its kth pair • Get current value of qr’s register by finding large m such that qr’s mth pair has been computed and get the winning suggestion • Apply qj’s transition function to get the value of qj’s state after its kth pair

Impossibility of Single Failure Consensus (con’t) Each execution of A’ (by p’s) simulates an execution of A (by q’s). If pi observers a qj making a decision, then it makes the same decision. • If the simulated execution is “admissible” (by failure assumption on q’s) then it satisfies: • Termination: eventually all q’s decide • Agreement: all q’s agree • Validity: If all q’s have input v, then the decision is v. • So A’ would be a correct execution

Impossibility of Single Failure Consensus (con’t) Why is the simulated execution admissible? We need to show that at least n-1 processors take an infinite number of steps in it. How can a simulation of qj be blocked? If p0 or p1 crashes during its simulation of qj’s kth pair, e.g.: • p0 writes a suggestion then crashes • p1 sees p0s suggestion and writes 0 to its flag • p0’s flag remains unset forever • So qj’s kth pair is never computed • But the crash of 1 pi can only block the simulation of 1 qj. In the example, p1 would continue simulating all other q’s.

Impossibility of Consensus in Message Passing • Assume there exists an n-processor, consensus algorithm A for message passing with 1 fault • Use A as a subroutine to design A’ for shared memory • Previous results show A’ cannot exist • So A cannot exist • Idea of A’: Simulate message channels with read/write register. Then run A on top of these channels to get A’.

Implications and Limitations of the Result • FLP says consensus is impossible in an asynchronous environment. • All of the proofs are about liveness, not safety • Castro/Liskov rely on this • Explains “window of vulnerability” in practice: • Interval of time in which a fault can cause entire system to wait indefinitely • Do you care about liveness or response time (soft real-time guarantees) • From a theoretical perspective, one can also “get around” this result by: • Using randomization (algorithm due to Ben Or tolerates <= 1/3 faulty processors) • Using RMW register, rather than just R/W

Overview of Results on Consensus • Let f be the maximum number of faulty processors. • The following are tight bounds for synchronous message passing: • Partially synchronous case is not as well studied.

Administrivia • If you’re doing a project and haven’t met with me in last 3 weeks, let me know asap. • Final project deadlines: • Poster session Dec 13 in pm (with 262) • Final papers due Dec 15 • Papers online for next week by Thursday

CS 294-8 Consensus Revisited http://www.cs.berkeley.edu/~yelick/294

CS 294-8 Consensus Revisited http://www.cs.berkeley.edu/~yelick/294

Presentation Transcript

The Post-PC Era: It’s All About Services

japan- taxes on alcoholic beverages

Chapter 4: Multiprocessors and Thread-Level Parallelism

Value Chain Assignment

If the Internet is the answer, then what was the question?

Distributed Memory Programming in MPI and UPC

Day 2 Oracle Berkeley DB Admin and Problem Management

INSTRUMENTAL VAGINAL DELIVERY : REVISITED

Introduction to Communication-Avoiding Algorithms www.cs.berkeley.edu /~ demmel /SC11_tutorial

Reaching Consensus: Why it can’t be done

More Routing

Introduction to Communication-Avoiding Algorithms cs.berkeley /~ demmel /SC12_tutorial

Computing with Words and its Applications to Information Processing, Decision and Control

David Culler Electrical Engineering and Computer Sciences University of California, Berkeley

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress

Chapter 3: Instruction-Level Parallelism

HTML Revisited

Achieving Consensus with Unknown Participants

Granular Computing: A New Problem Solving Paradigm

Automatic Performance Tuning Sparse Matrix Kernels

Autoinflammatory diseases