Understanding Consensus Mechanisms: Strong Failure Detectors vs. Two-Phase Commit

Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here it is. Recall the definition of S (strong) FD: Strong completeness + weak accuracy

Consensus using S {Program for process p} Vp := (,, .. ); Vp[p] := input of p; Dp := Vp (Phase 1) Same as phase 1 of consensus with P (Phase 2) send (Vp, p) to all; receive (Dq, q) from all q, or q is a suspect; k :=1; do k ≠ n  ifVq[k]: Vp[p] ≠ Vq[k] =  Vp[k] := Dp[k] :=fi od (Phase 3) Decide on the first element Vp [j]: Vp [j] ≠ 

Example 0 1 2 3 4 0 1 2 3 4 {1, 4} Never suspected  -   -  - -  - 0   -  -  - -  - {2, 4} 1  - -  -     - {4} 2  - -  - {2, 4}   -  - 3 crashed 4 V after Phase 2 V after Phase 1 List of suspects

Atomic Commit Protocols Network of servers The initiator of a transaction is called the coordinator, and the remianing servers are participants S1 Servers may crash S3 S2

Requirements of Atomic Commit Protocols S1 Network of servers Termination. All non-faulty servers must eventually reach an irrevocable decision. Agreement. If any server decides to commit, then every server must have voted to commit. Validity. If all servers vote commit and there is no failure, then all servers must commit. Servers may crash S3 S2

One-phase Commit server participant Commit / abort server server client participant coordinator server participant If a participant deadlocks or faces a problem then the coordinator may never be able to find it. Too simplistic.

Two-phase commit (2PC) Phase 1: The coordinator sends VOTE to the participants. and receive yes / no from them. Phase 2: if server j: vote(j) = yes multicast COMMIT to all severs  server j : vote (j) = no multicast ABORT to all servers fi What if failures occur?

Failure scenarios in 2PC (Phase 1) Fault: Coordinator did not receive YES / NO: OR Participant did not receive VOTE: Solution:Broadcast ABORT; Abort local transactions

Failure scenarios in 2PC (Phase 2) (Fault) A participant does not receive a COMMIT or ABORT message from the coordinator (it may be the case that the coordinator crashed after sending ABORT or COMIT to a fraction of the servers), then it remains undecided, until the coordinator is repaired and reinstalled into the system. This blocking is a known weakness of 2PC.

Coping with blocking in 2PC A non-faulty participant can ask other participants about what message (COMMIT or ABORT) did they receive from the coordinator, and take appropriate actions. But what if no non-faulty participant received anything? Who knows if the coordinator committed or aborted the local transaction before crashing? Continue to wait …

Non-blocking Atomic Commit A blocking protocol has the potential to prevent non-faulty participants from reaching a final decision. A solution to the atomic commitment problem is called non-blocking, if in spite of server crashes, everynon-faulty participant eventually decides. One solution is to impose the requirement of uniform agreement

Uniform agreement If any participant (faulty or not) delivers a message m (commit or abort) then all correct processes eventually deliver m. To implement uniform agreement, no server should deliver a COMMIT or ABORT message until it has relayed it to all other servers. If a process times out in phase 2, then it decides abort.

Recovery: Stable storage Creates the illusion of an incorruptible storage, even if a writer or a disk crashes at any time. The implementation Uses at least two independent disks. A0 A1 inspect update

To write, do the following: copy on disk A0; record timestamp T0; compute checksum S0; copy on disk A1; record timestamp T1; compute checksum S1 Readers check four cases: Both checksums OK and T1>T0 Both checksums OK and T1<T0 Checksum on A1 wrong Checksum on A2 wrong (Which copy to accept in each case?) Stable storage A0 update inspect A1

Mechanism for (backward) error recovery. Transaction states are periodically stored on stable storages. Following a failure, the transaction rolls back to the nearest checkpoint. Independent (unsynchronized) or coordinated (synchronized) checkpointing Checkpointing

Classification of checkpointing Coordinated Checkpointing takes a consistent snapshot. Has some overhead. Uncoordinated checkpointing apparently has no overhead. But it may have some efficiency problems.

Checkpointing (continued) Some actions can be reversed, but some cannot be reversed (like dispensing cash from an ATM machine, printing a document etc). Such actions are logged, and during replay, logs substitute real actions.

Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical groups

Major issues • Atomic multicast • Ordered multicast • Dynamic groups • Failure handling

Atomic multicast • A multicast is called atomic, when the message is delivered to every correct (i.e. functioning) member, or to no member at all. • Sometimes, certain features available in the infrastructure of a distributed system simplify the implementation of multicast. Examples are (1) multicast on an ethernet LAN (2) IP multicast

Basic vs. reliable multicast Basic multicast does not consider crash failures. Reliable multicast does. Three criteria for basic multicast: Liveness. Each process must receive every message Integrity. No spurious message received No duplicate. Accepts exactly one copy of a message

Reliable atomic multicast Sender’s programReceiver’s program i:=0; ifm is new  do i ≠ n  accept it; send message to i; multicast m; i:= i+1 m is duplicate  discard m odfi Tolerates process crashes.

Understanding Consensus Mechanisms: Strong Failure Detectors vs. Two-Phase Commit

Understanding Consensus Mechanisms: Strong Failure Detectors vs. Two-Phase Commit

Presentation Transcript

Failure Detectors

Failure Detectors: A Perspective

Failure Detectors

Failure Detectors

Revisiting Reference

Revisiting Rhetoric

Revisiting Revision

Unreliable Failure Detectors for Reliable Distributed Systems

Revisiting Differentiation

Revisiting revision

Revisiting Parallelism

Unreliable Failure Detectors for Reliable Distributed Systems

Revisiting Statistics

Failure Detectors

Timeliness, Failure Detectors, and Consensus Performance

Revisiting the optimum PMT size for water-Cherenkov megaton detectors

Revisiting the optimum PMT size for water-Cherenkov megaton detectors

Failure Detectors: A Perspective

Timeliness, Failure Detectors, and Consensus Performance