240 likes | 356 Vues
Efficient Eventual Leader Election in Crash-Recovery Systems. Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country, UPV/EHU. Contents. Motivation System Model Efficiency Definitions A Near-Efficient Algorithm Instability Awareness Efficient Algorithms
E N D
Efficient Eventual Leader Election in Crash-Recovery Systems Mikel Larrea, Cristian Martín, Iratxe Soraluze University of the Basque Country, UPV/EHU
Contents • Motivation • System Model • Efficiency Definitions • A Near-Efficient Algorithm • Instability Awareness • Efficient Algorithms • Relaxing the Assumptions Mikel Larrea – Mannheim, May 2011
Motivation • Unreliable failure detectors have been used to address Consensus and related problems in asynchronous crash-prone distributed systems • Theory: impossibility/possibility results, minimality results • Practice: efficient implementations, transformations • The Omega failure detector satisfies the following property (“eventual leader election”): • there is a time after which every correct process always trusts the same correct process • Omega is the weakest failure detector for solving Consensus in the crash failure model Mikel Larrea – Mannheim, May 2011
p2 p1 p3 p4 p5 p6 p7 Eventual Leader Election Ω=p4 Ω=p4 Ω=p4 Ω=p4 correct crashed Mikel Larrea – Mannheim, May 2011
Is Omega a Failure Detector? • The Eventually Perfect failure detector (P) satisfies: • Strong completeness: eventually every process that crashes is permanently suspected by every correct process • Eventual strong accuracy: there is a time after which correct processes are not suspected by any correct process • The Eventually Strong failure detector (S) satisfies: • Strong completeness • Eventual weak accuracy: there is a time after which some correct process is never suspected by any correct process • Omega is equivalent to S Mikel Larrea – Mannheim, May 2011
This Work • We address the implementation of Omega in the crash-recovery failure model • crashed processes can recover • some (unstable) processes can crash and recover infinitely often • Previously proposed algorithms are not efficient • they require every process to periodically send a message to the rest of processes • We propose several algorithms in which eventually, among correct processes, only one (the elected leader) keeps sending messages forever Mikel Larrea – Mannheim, May 2011
System Model • Finite set of n processes = {p1, p2, ..., pn} that communicate only by message-passing • processes are synchronous • Every pair of processes is connected by two unidirectional communication links, one in each direction • types of links: eventually timely, fair lossy • Crash-recovery failure model • types of processes: eventually up, eventually down, unstable • eventually up processes are correct, the rest incorrect • we assume that at least one process is correct Mikel Larrea – Mannheim, May 2011
Efficiency Definitions • An algorithm implementing Omega in the crash-recovery failure model is efficient if there is a time after which only one process sends messages forever • An algorithm implementing Omega in the crash-recovery failure model is near-efficient if there is a time after which, among correct processes, only one sends messages forever • Since the leader must send messages forever, an efficient algorithm is also near-efficient • In a near-efficient algorithm, besides the leader, unstable processes can send messages forever Mikel Larrea – Mannheim, May 2011
A Near-Efficient Algorithm • Assumptions on communication reliability/synchrony: • (i) for every correct process p, there is an eventually timely link from p to every correct and every unstable process • (ii) for every unstable process u, there is a fair lossy link from u to every correct process • Uses a set of candidates to become leader, and a counter of the number of times that each process has recovered • During initialization (and upon recovery), a RECOVERED message is sent to the rest of processes • The leader is set to the process in the set of candidates with the smallest associated counter • If a process considers itself the leader, it sends a LEADER message periodically to the rest of processes Mikel Larrea – Mannheim, May 2011
A Near-Efficient Algorithm Mikel Larrea – Mannheim, May 2011
A Near-Efficient Algorithm Mikel Larrea – Mannheim, May 2011
Unstable Processes Disagree • With this algorithm, eventually every correct process always trusts the same correct process l. Consequently, eventually among correct processes, only one keeps sending LEADER messages () • Concerning the behavior of unstable processes: • (1) upon recovery, they send a RECOVERED message to the rest of processes • (2) initially they trust themselves, and they can trust other unstable processes before trusting process l () • We propose an adaptation that avoids (2) • initially they do not trust any process, and —if they remain up for sufficiently long— then l until they crash • the adaptation assumes a majority of correct processes Mikel Larrea – Mannheim, May 2011
p7 p6 p2 p1 p3 p4 p5 unstable eventually up eventually down Unstable Processes Disagree Ω=p4 Ω=p4 Ω=p4 Ω=p2 Ω=p2 Ω=p4 Mikel Larrea – Mannheim, May 2011
Instability Awareness Mikel Larrea – Mannheim, May 2011
Instability Awareness Mikel Larrea – Mannheim, May 2011
p7 p6 p2 p1 p3 p4 p5 unstable eventually up eventually down Instability Awareness Ω=p4 Ω=p4 Ω=p4 Ω=NULL Ω=p4 Ω=p4 Mikel Larrea – Mannheim, May 2011
Instability Awareness • The proposed adaptation makes the algorithm no longer near-efficient, since all correct processes may send PONG messages forever () • Can we design an algorithm such that… • processes do not have access to stable storage, • unstable processes eventually do not disagree, • and it is near-efficient? • Yes We Can! () Mikel Larrea – Mannheim, May 2011
A Near-Efficient++ Algorithm Mikel Larrea – Mannheim, May 2011
A Near-Efficient++ Algorithm Mikel Larrea – Mannheim, May 2011
An Efficient Algorithm • Assumes that local stable storage is accessible • process recovery counter • leader identity • Assumption on communication reliability/synchrony: • (i) for every correct process p, there is an eventually timely link from p to every correct and every unstable process • No need of RECOVERED messages • With this algorithm, eventually every process that is up, either correct or unstable, always trusts the same correct process l • assuming that every unstable process succeeds in writing l definitely in its stable storage Mikel Larrea – Mannheim, May 2011
Another Efficient Algorithm • Besides (i), assumes a non-decreasing local clock at each process • The elected leader will be the “oldest” correct process, i.e., the process that first recovers definitely Mikel Larrea – Mannheim, May 2011
Relaxing the Assumptions • Based on message relaying • Weaker assumptions on communication reliability/synchrony: • (i’) for every correct process p, there is an eventually timely path from p to every correct and every unstable process • (ii’) for every unstable process u, there is a fair lossy link from u to some correct process • Algorithms are no longer (near-)efficient Mikel Larrea – Mannheim, May 2011
The One Slide to Remember • The Omega failure detector provides an eventual leader election functionality in a distributed system • Theory: weakest failure detector for solving Consensus • Practice: used by several real fault-tolerant protocols • It is interesting to design efficient algorithms implementing Omega • In the crash-recovery failure model, we have to cope with unstable processes • to avoid them to send messages forever • to avoid disagreement with correct processes • Stablestorage, if available, makes things easier Mikel Larrea – Mannheim, May 2011
An Example: Paxos • Leslie Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems, 1998. First submitted in 1990! • Leader-based Consensus algorithms • Could benefit from efficient leader election • Production use of Paxos (from wikipedia): • Google Chubby distributed lock service • IBM SAN Volume Controller • Microsoft Autopilot cluster management service • WANdisco Distributed Coordination Engine • Scalien Keyspace Mikel Larrea – Mannheim, May 2011