Distributed Algorithms 6. Distributed Consensus with Process failure 7. More Consensus Problems 8. Modeling II: Asynchro

Distributed Algorithms6. Distributed Consensus with Process failure7. More Consensus Problems8. Modeling II: Asynchronous System Model Preethi Vishwanath San Jose State University Computer Science October 3rd 2006

Agreement problem Processes start with individual inputs from a particular value set V. All the non faulty processes are required to produce outputs from the same value set V, subject to simple agreement and validity conditions Assumption for validity – all processes begin with the same value v, the only allowed decision value is v. Assumptions Network is a n-node connected undirected graph with processes 1,..,n, where each process knows the entire graph. For each process there is exactly one start state containing each input value. Links are perfectly reliable, all messages send are delivered. Relationship between the stopping and the Byzantine agreement problem Not necessary that an algorithm that solves Byzantine’s problem would solve stopping failure. Difference is that in the stopping case, we require that all the processes that decide, even those that subsequently fail, must agree. If the agreement condition, for the stopping failure case is replaced by the one for the Byzantine failure case, then the implication does hold. Alternatively, if all the non faulty processes in the Byzantine algorithm always decide on the same round, then the algorithm also works for stopping failure. . Agreement Problem

Stopping failure model Processes may simply stop without warning Intended to model unpredictable processor crashes A process might also stop after sending its messages for some round but before performing its transition for that round Agreement and Validity conditions Agreement: No two processes decide on different values. Validity: If all processes start with the same initial value v V, then v is the only possible decision value. Termination: All non faulty processes eventually decide. Byzantine Failure model Faulty processes may exhibit completely unconstrained behavior. Intended to model any arbitrary type of processor malfunction. Also useful in processor fault diagnosis, where they can permit a collection of processors to agree on which of their number has failed Agreement and Validity conditions Agreement: No two non faulty decide on different values. Validity: If all no faulty processes start with the same initial value v V, then v is the only possible decision value for a non faulty process. Termination: The termination condition is the same. Failure modes

Stopping Failures – Flood Set Algorithm • Each process maintains a variable W containing a subset of V*1. • Initially process i’s variable W contains only i’s initial value. • For each of f*2+1 rounds, each process broadcasts W, then adds all the elements of the received sets to W. • After f+1 rounds, process i applies the following decision rule “ If W is a singleton*3 set, then i decides on the unique element of W; otherwise, i decides on the default value w0. • Lemma • If no process fails during a particular round r, 1<= r <= f+1, then Wi(r) = Wj(r) for all I and j that are active after r rounds. • Suppose that Wi(r) = Wj(r) for all I and j that are active after r rounds. Then for any round r1, r <= r1 <= f+1, the same holds, that is, Wi(r1) = Wj(r1) for all I and j that are active after r1 rounds. • If processes i and j are both active after f+1 rounds, then Wi = Wj at the end of round f+1 • Theory • Floodset solves the agreement problem for stopping failures *1 V : Input Set *2 f : Limited amount of processes which may fail *3 Singleton set : a singleton is a set with exactly one element.

Exponential Information Gathering Algorithms • For every string x that occurs as a label of a node of T, each process has a variable val(x). • Initially, each process i decorates the root of its tree with its own initial value, that is, it sets its val(λ) to its initial value. • Round 1: Process i broadcasts val(λ) to all processes, including i itself. Then process i records the incoming information • If a message with value vЄV arrives at i from j, then i sets its val(j) to v. • If no message with a value in V arrives at i from j, then i sets val(j) to null. • Round k, 2 <= k <= f+1: Process I broadcasts all pairs (x,v), where x is a level k-1 label in T that does not contain index I, vЄV, and v=val(x). Then process I records the incoming information. • If xj is a level k node in T, where x is a string of process indices and j is a single index, and a message saying that val(x)= vV arrives at i from j, then I sets val(xj) to v. • If xj is a level k node label and no message with a value in V for val(x) arrives at i from j, then i sets val(xj) to null. At the end of f+1 rounds, process i applies a decision rule. Namely, let W be the set of non-null vals that decorate nodes of i’s input value. EIG tree : data structure for recording various values along communication paths.

Assumptions Processes 1,2, and 3 solve the Byzantine agreement problem, tolerating one fault. They decide at the end of two rounds and that they operate in a particular, constrained manner: at the first round, each process simply broadcasts its initial value, In the second round, each process reports to each other process what was told to it in the first round by the third process. Execution α1: Process 1 and 2 are non faulty and start with initial values of 1 Process 3 is faulty and starts with an initial value of 0. 1st round – all processes report their value truthfully. 2nd round – processes 1 and 2 report truthfully what they heard in the first round, process 3 tells 1(falsely) that 2 send 0 in round 1 and otherwise behaves truthfully. Execution α2: Symmetric to α1 Process 2 and 3 are non faulty and start with initial values of 0, while process 1 is faulty and starts with an initial value of 1. 1st round – all processes report their value truthfully 2nd round – process 2 and 3 report truthfully what they heard in the first round, while process 1 tells 3(falsely) that 2 sent 1 in round 1 and otherwise behaves truthfully. Execution α3: Suppose process 1 and 3 are non faulty and start with 1 and 0, respectively. Process 2 is faulty, telling 1 that its initial value is 1 and telling 3 that its initial value is 0. All processes behave truthfully in the second round. Byzantine Failure Algorithms

EIG Algorithm for Byzantine Agreement • Presupposes that the number of processes is large relative to the number of faults. n > 3f • Difference is that a process that receives an “ill-formed” message corrects the information to make it look sensible. • EIGByz algorithm • Processes propagate values for f+1 rounds. • Exception : If a process i ever receives a message from another process j that is not of the specified form then i “throws away” the message, that is, acts just as if process j did not send it anything at that round • f+1 rounds; process i adjusts its val assignment so that any null value is replaced by the default value v0. • To determine its decision; process i works from the leaves up in its adjusted, decorated tree. • For each leaf labeled x, newval(x) := val(x). • For each non-leaf node labeled x, newval(x) is defined to be the new val held by a strict majority of the children of node x • If no majority exists, process I sets newval(x):=v0. • Process I’s final decision is newval(λ)

General Byzantine Agreement Using Binary Byzantine Agreement • Each process has local variables x,y,z and vote, x initialized to process’s input value; y,z and vote initialized arbitrarily. • Round 1: Process i sends value x to all processes, including itself • If there are >= n-f copies of a particular value vЄV, then i sets y:=v, otherwise y:=null • Round 2: Process i sends value y to all processes, including itself • If there are >= n-f copies of a particular value in V, then I sets vote := 1; otherwise vote := 0. • I sets z equal to non-null value that occurs most often among the messages received by I at this round, with ties broken arbitrarily • Round r, r >= 3: Processes run the binary Byzantine agreement sub-routine using the values of vote as the input values. • If process i decides 1 in the subroutine and if z is defined • Final decision of the algorithm is z • Otherwise default value v0.

Best Algorithms • OptFloodSet algorithm • Best Algorithm for Stopping Failure • Uses f+1 rounds • 2n2 messages • O(n2b) bits of communication. • EIGByz algorithm • Byzantine case • Uses f+1 rounds • Exponential amount of communication • PolyByz • Byzantine case • Uses 2(f+1) rounds • Polynomial amount of communication

Natural generalization of the ordinary agreement problem Instead of requiring that all processes decide on exactly the same value, we insist only that they limit their decision to a small number, k of distinct values Practical Situations Videotape, desirable for a number of processes to agree on a small number of frequencies to use for the broadcast of a large number of data. Assume that the network is a n-node connected undirected graph with processes 1,..,n; where each process knows the entire graph Each process starts with an input from a fixed set V and is supposed to eventually output a decision from the set V. Assume that at most f processes might fail. Only stopping failure considered Agreement : There is a subset W of V, |W|=k, such that all decision values are in W. Validity: Any decision value for any process is the initial value of some process. Termination: All nonfaulty processes eventually decide. k-agreement problem

FloodMin Algorithm • Each process maintains a variable min-val, originally set to its own initial value. • For each [f/k]+1 rounds, the processes all broadcast their min-vals • Each process resets its min-val to the minimum of its old-val and all the values in the incoming messages. • The decision value is min-val Formal Algorithm The message alphabet is V Statesi: Round Є N, initially 0 Decision Є V U {unknown}, initially unknown Min-val Є V, initially i’s initial value. msgsi: If rounds <= [f/k] then send min-val to all other processes. transi: Rounds := rounds + 1 Let mj be the message from j, for each j from which a message arrives Min-val := min{min-val} U {mj : j != i} If rounds = [f/k] + 1 then decision := min-val. Where n : number of processes f : number of failures tolerated k : allowed number of decision values

A k-dimensional collection of executions rather than a (one-dimensional) chain. Adjacent executions in this collection are indistinguishable to designated non faulty processes. Steps Assign an execution in which all processes have input 0 to the lower left-hand corner, an execution in which all processes have input 1 to the lower right-hand corner, and an execution in which all processes have input 2 to the upper right-hand corner. All executions assigned to vertices on the lower edge have inputs chosen from {0,1}. Process assignment, for each tiny simplex T, there is a single execution α with at most f faults that is compatible with the executions and processes assigned to the corners of T in the following sense. All the processes assigned to the corners of T are non faulty in α. If execution α1 and process i are assigned to some corner of T, then α and α1 are indistinguishable to i. We color a vertex x having associated execution α and associated process I with the color that corresponds to i’s decision value in α. The colors of k+1 corners of B are all different. The color of each point on an external edge of B is the color of one of the corners at the endpoints of the edge. More generally, the color of each point on any external face of B is the color of one the corners of the face. Bermuda Triangle all 2 all 0 and 2 all 1 and 2 all 0 all 1 all 0 and 1

Operations on l-runs • L-run is an run augmented with exactly l tokens for each round number t, 1 <= t <= r, in such a way that if some process I fails at some round t, then there is a token attached to some pair (I,ti), ti <= t • Operations • Remove(i,j,t) • where I and j are processes indices and t is a round number, I <= t <= r. • This operation removed the triple if it is there, and has no effect otherwise. • It can only be applied if I and j are both silent after t rounds and there is a token attached to some (I,t1), t1 <= t. • Add(i,j,t) • This operation adds the triple (i,j,t) if it is not already there and has no effect otherwise. • It can only be applied if i and j are both silent after t rounds and i is active after t-1 rounds. • Change(I,v) • This operation changes process I’s input value to v and has no effect if this input value is already v. • It can only be applied if i is silent after 0 rounds and (i,1) has a token. • Move(i,j,t) • This operation moves a token from (i,t) to (j,t), where j is either i + 1 or i – 1. • It can only be applied if (i,t) has a token and if all failures have permission from other tokens.

Approximate Agreement Problem • Processes start with the real-valued inputs and are supposed to eventually decide on real-valued outputs. • Are permitted to send real-valued data in messages. • Instead of having to agree exactly, as in the ordinary agreement problem, this time the requirement is just that they agree to within a small positive real-valued tolerance Є. • Agreement The decision values of any pair of nonfaulty processes are within Є of each other. • Validity Any decision value for a nonfaulty process is within the range of the initial values of the nonfaulty processes. • Termination All nonfaulty processes eventually decide. • Situations in which problem arises • In Clock synchronization algorithms, where processes attempt to maintain clock values that are close but do not necessarily agree exactly. • Many real distributed network algorithms work in the presence of approximately synchronized clocks, so approximate agreement on clock values is usually sufficient.

ByzApproxvAgreement algorithm: Processes run an ordinary Byzantine agreement to decide on the value for each process. All these algorithms run in parallel. In the algorithms for process i, i begins by sending its message to all processes in round 1, then all processes use the received values as their inputs. When these algorithms terminate, all nonfaulty processes have the same decision values for all processes. Each chooses the [n/2]th largest value in the multiset of decision values as its own final decision value. ConvergeApproxAgreement algorithm: Process I maintains a variable val containing its latest estimate. Initially, vali, contains i’s initial value. At each round, process i does the following First, it broadcasts its val value to all processes, including itself. Then it collects all the values it has received at that round into a multiset W; If i does not receive a value from some other process, it simply picks some arbitrary default value to assign to that process in the multiset, thus ensuring that |W| = n Then, process I sets val to mean( select (reduce (W))), ie process I throws out the f smallest and f largest elements of W. From what is left, i selects only the smallest element and every fth element thereafter. Finally, val is set to the average (mean) of the selected elements.

Models a distributed system component that can interact with other system components. Actions are classified as either input, output or internal. Input and outputs used for communication with the automaton’s environment; internal actions are visible only to the automation itself An I/O automaton A, which we also call simply an automaton, consists of five components sig(A), a signature states(A), a (not necessary finite) set of states. start(A), a nonempty subset of states(A) known as the start states or initial states. Trans(A), a state-transition relation, where trans(A) states(A) X acts(sig(A)) X states(A) Tasks(A), a task partition, which is an equivalent relation on local(sig(A)) having at most countably many equivalent classes. I/O Automata A process I/O automaton Decide(v)i Init(v)i Pi send(m)i,j Receive(m)j,i A channel I/O automaton Ci,j Send(m)i,j Receive(m)I,j

Composition Allows an automaton representing a complex system to be constructed by composing automata representing individual system components. We define a countable collection {Si}iЄI of signatures to be compatible if for all i, jЄI, i!=j, all of the following hold: Int(Si) Π acts(Sj) = 0 Out(Si) Π out(Sj) = 0 No action is contained in infinitely many sets acts(Si) The composition ΠiЄISi of a countable compatible collection of signatures {Si}iЄI is defined to be the signature with Out(S)=UiЄIout(Sj) Int(S) = UiЄIint(Sj) In(S) = UiЄIin(Sj) – UiЄIout(Sj) Composition of a countable ,is the automata defined as follows Sig(A) = ΠiЄI sig(Ai) States(A) = ΠiЄI states(A) Start(A) = ΠiЄIstart(Ai) Trans(A) is the set of triples (s,Π,s1) such that, for all i Є I, if ΠЄ acts (Ai), then (si, Π,si1) Є trans(Ai); otherwise si = si1 Tasks(A) = UiЄItasks(Ai) Hiding “hides” output actions of an I/O automaton by reclassifying them an internal actions. If S is a signature and ΦC out(S), then hideΦ(S) is defined to be the new signature S1, where in(S1) = in(S), out(S1) = out(S) – Φ, and int(S1) = int(S) U Φ Operations on Automata

Signature: Input: Internal: request tick Output: clock(t),t Є N States: counterЄ N, initially 0 Flag, a Boolean, initially false Transitions: Tick clock(t) Precondition: Precondition: true flag = true counter = t Effect: Effect: counter := counter+1 flag:=false Request Effect: flag := true Tasks: {tick} {clock(t) : t Є N The clock automaton simple “ticks” forever, incrementing a counter. If a request arrives, Clock responds ( in a separate step) with the current value of the counter. Clock Automaton

Distributed Algorithms 6. Distributed Consensus with Process failure 7. More Consensus Problems 8. Modeling II: Asynchro