510 likes | 641 Vues
Fault Tolerance. Chapter 7. Failures in Distributed Systems. Partial failures – characteristic of distributed systems Goals: Construct systems which can automatically recover from partial failures System should operate in an acceptable way even during failures. Basic of Dependable Systems.
E N D
Fault Tolerance Chapter 7
Failures in Distributed Systems • Partial failures – characteristic of distributed systems • Goals: • Construct systems which can automatically recover from partial failures • System should operate in an acceptable way even during failures
Basic of Dependable Systems • Availability – Property that the system is operating correctly at a given moment • Reliability – Property that a system can continuously run without failures • Safety – Failures should not lead to catastrophes • Maintainability – How easy is it to repair a failed system
Failures, Errors and Faults • Failure – A system not meeting its promises • Error – Part of system’s state that may lead to failure • Eg: Damaged packets • Fault – Cause of error • Bad transmission medium, bad disk, etc. • Types of faults • Transient – Occur once and disappear • Intermittent – Appear, vanish and reappear • Permanent – Continues until repair
Failure Models • Different types of failures.
Failure Masking by Redundancy • Hiding failures from other processes • Three types of redundancies • Information redundancy – Extra data is added to hide failure. • Eg. Hamming codes • Timing redundancy – Extra actions are performed for hiding failures • Redoing a transaction • Physical redundancy – Extra equipment (processes) for hiding failures • Extra disks, process pools etc.
Process Resilience • Organizing process into groups • Message sent to group is received by all members • Dynamic groups • Processes can be members of several groups • Flat groups – All processes are equal • Complicated decision making • Hierarchical group – Coordinator and workers • Single point of failure
Flat Groups versus Hierarchical Groups • Communication in a flat group. • Communication in a simple hierarchical group
Group Membership • Group server: Handles group management functions • Single point of failure • Distributed group management • Sending entry/exit messages to all nodes • Exit handling • No polite announcement for crashes • Synchrony of exits and enters with messages • Process should receive all messages from the moment it joins the network and until it exits
Failure Masking via Replication • Primary backup protocol • Replicated write protocol • K fault tolerance • If processes fail silently – k+1 processes • For Byzantine failure – (2K+1) processes
Agreement in Faulty Systems • Agreement is more complex • Agreement needed for electing coordinator, committing transactions etc. • Goal – Non faulty processes should reach consensus in finite number of steps • Perfect processes, faulty communication • Two army problem
Consensus in Faulty Processes • Byzantine generals problem • Blue army is split into many units • Pair-wise communication • Each general reports his troop strength • Faulty generals may report false strengths • Problem is to arrive at consensus • Need (3m+1) processes to tolerate m faulty generals
Agreement in Faulty Systems (1) • The Byzantine generals problem for 3 loyal generals and1 traitor. • The generals announce their troop strengths (in units of 1 kilosoldiers). • The vectors that each general assembles based on (a) • The vectors that each general receives in step 3.
Agreement in Faulty Systems (2) • The same as in previous slide, except now with 2 loyal generals and one traitor.
RPC Semantics in Presence of Failures • 5 types of exceptions • Client cannot locate server • Request to server is lost • Server crashes after receiving request • Reply message from server is lost • Client crashes after sending in request
Not Locating Server • Causes: • Server might be down • Version mismatch between client and server stubs • Possible solutions • Raising exception • Relying on programming language for a systems problem • Not all languages have exceptions • Transparency is compromised
Lost Request Messages • Easiest to handle • Use timers • Retransmission on timeout • Duplicate detection at server end
Server Crashes • Server can crash either before executing or after executing (before sending reply) • Crash after execution needs to be reported to client • Crash before execution can be handled by retransmission • Client’s OS cannot distinguish between the two
Server Crashes • A server in client-server communication • Normal case • Crash after execution • Crash before execution
Handling Server Crashes • Wait until server reboots and try again • At least once semantics • Give up immediately and report failure • At most once semantics • Guarantee nothing • The need is for exactly once semantics • Two messages to clients • Request acknowledgement • Completion message
Server and Client Strategies • Server strategies • Send completion message before operation • Send completion message after operation • Client strategies • Never reissue a request • Always reissue a request • Only reissue request if acknowledgement not received • Only reissue if completion message not received • Client never knows the exact sequence of crash • Server failures changes RPC fundamentally
Server Crashes (2) • Different combinations of client and server strategies in the presence of server crashes.
Lost Reply Messages • Timer at client • Client is not sure whether the reply is lost or server is slow • Idempotent operations • Can all operations be made idempotent? • Sequence numbers in requests • Server refuses to perform a duplicate request • Server should maintain state of each client • A bit to distinguish duplicates from originals
Client Crashes • Can lead to orphans • Wastages of resources • Confusions or reboots • Extermination with logging • Reincarnation with epochs • Gentler re-incarnation • Expiration
Reliable Group Communication • Reliable multicasting is important for several applications • Transport layer protocols rarely offer reliable multicasting • What is reliable multicasting? • Communication sent to the group should reach each member • What happens if process crashes (or enters) during multicasting? • Multicasting with faulty processes & multicasting with non-faulty processes
Basic Reliable Multicasting • Group is assumed to be stable • Communication may be faulty • Underlying unreliable multicasting service • Easy if the number of processes are small • Use acknowledgements • Either positive or negative • Sequence number for each message • Retransmission on negative ack or no timeout • Poor scalability of positive ack
Basic Reliable-Multicasting Schemes • A simple solution to reliable multicasting when all receivers are known and are assumed not to fail • Message transmission • Reporting feedback
Nonhierarchical Feedback Control • Positive acks are not scalable • Why not use negative acks? • Arbitrary wait times (no timeouts) • Feedback Suppression • Reducing the number of acks returned to the sender • Only negative feedback • Feedback is multicast to all members • Retransmissions are multicast too • Feedback time has to be carefully adjusted • Can unnecessarily interrupt other processes
Nonhierarchical Feedback Control • Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.
Hierarchical Feedback Control • The essence of hierarchical reliable multicasting. • Each local coordinator forwards the message to its children. • A local coordinator handles retransmission requests.
Atomic Multicast • Message is delivered to all or none • Database example • Crashed replica needs to know which updates it missed • Atomic multicasting eliminates this problem • Update is performed if the remaining replicas have agreed what the group looks like
Virtual Synchrony (1) • The logical organization of a distributed system to distinguish between message receipt and message delivery
Atomic Multicast • Each multicast message is associated with a list of processes • Changes to group membership are announced via “View Change” messages • “m” is delivered to all members before “vc” is delivered or “m” is not delivered at all • What happens if sender crashes • Abort message or ignoring m • View changes act as barriers which no multicasting can cross
Virtual Synchrony (2) • The principle of virtual synchronous multicast.
Ordering of Multicast Messages • Unordered • FIFO • Causally-ordered • Totally-ordered
Message Ordering (1) • Three communicating processes in the same group. The ordering of events per process is shown along the vertical axis.
Message Ordering (2) • Four processes in the same group with two different senders, and a possible delivery order of messages under FIFO-ordered multicasting
Implementing Virtual Synchrony (1) • Six different versions of virtually synchronous reliable multicasting.
Implementing Virtual Synchrony (2) • Process 4 notices that process 7 has crashed, sends a view change • Process 6 sends out all its unstable messages, followed by a flush message • Process 6 installs the new view when it has received a flush message from everyone else
Distributed Commit • Commit – Making an operation permanent • Transactions in databases • One phase commit does not work !!! • Two phase commit & three phase commit • Two phase commit • Coordinator sends a VOTE_REQUEST • Participant sends a VOTE_COMMIT or VOTE_ABORT • Coordinator collects all votes and sends GLOBAL_COMMIT or GLOBAL_ABORT to all • Processes commit or abort the transaction
Two-Phase Commit (1) • The finite state machine for the coordinator in 2PC. • The finite state machine for a participant.
2 Phase Commit with Failures • Process failures can lead to indefinite blocking • Timeout mechanisms • Wait states • INIT of a participant: Abort and send VOTE_ABORT • WAIT of coordinator: Send VOTE_ABORT • READY of participant • When participant P is ready it can ask other participant Q • If Q is in INIT, Abort the transaction • If Q has received commit or Abort act accordingly • If Q has in WAIT, BLOCK
Two-Phase Commit (2) • Actions taken by a participant P when residing in state READY and having contacted another participant Q.
Coordinator Actions • Record WAIT and then multicast VOTE_REQUEST to everyone • After all decisions have been received, record the decision and then multicast
Participant Actions • Waits for a vote request • Upon receiving a request, the participant decides the vote • Records the vote and replies • Logs the global decision and then executes • DECISION_REQUEST if timeout
Two-Phase Commit (3) actions by coordinator: while START _2PC to local log;multicast VOTE_REQUEST to all participants;while not all votes have been collected { wait for any incoming vote; if timeout { while GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; exit; } record vote;}if all participants sent VOTE_COMMIT and coordinator votes COMMIT{ write GLOBAL_COMMIT to local log; multicast GLOBAL_COMMIT to all participants;} else { write GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants;} • Outline of the steps taken by the coordinator in a two phase commit protocol
Two-Phase Commit (4) actions by participant: write INIT to local log;wait for VOTE_REQUEST from coordinator;if timeout { write VOTE_ABORT to local log; exit;}if participant votes COMMIT { write VOTE_COMMIT to local log; send VOTE_COMMIT to coordinator; wait for DECISION from coordinator; if timeout { multicast DECISION_REQUEST to other participants; wait until DECISION is received; /* remain blocked */ write DECISION to local log; } if DECISION == GLOBAL_COMMIT write GLOBAL_COMMIT to local log; else if DECISION == GLOBAL_ABORT write GLOBAL_ABORT to local log;} else { write VOTE_ABORT to local log; send VOTE ABORT to coordinator;} • Steps taken by participant process in 2PC.
Two-Phase Commit (5) actions for handling decision requests: /* executed by separate thread */ while true { wait until any incoming DECISION_REQUEST is received; /* remain blocked */ read most recently recorded STATE from the local log; if STATE == GLOBAL_COMMIT send GLOBAL_COMMIT to requesting participant; else if STATE == INIT or STATE == GLOBAL_ABORT send GLOBAL_ABORT to requesting participant; else skip; /* participant remains blocked */ • Steps taken for handling incoming decision requests.
Recovery • Backward Recovery • Restoring system to previous consistent state • Forward Recovery • Attempt to bring the system to the next correct state • Needs what the correct state is • Checkpointing • Logging with checkpointing