Fault Tolerance and Replication

Fault Tolerance and Replication This power point presentation has been adapted from: (1) web.njit.edu/~gblank/cis633/Lectures/Replication.ppt

Content • Introduction • System model and the role of group communication • Fault tolerant services • Case study: Bayou and Coda • Transaction with replicated data

Introduction • Replication • Duplicate limited or heavily loaded resources • to provide access and ensure access after failures • Replication is important for performance enhancement, increased availability and fault tolerance.

Introduction • Replication • Performance enhancement • Data are replicated between several originating servers in the same domain • The workload is shared between the servers by binding all the server IP addresses to the site’s DNS name • It increases performance with little cost to the system

Introduction • Replication • Increased availability • Replication is a technique for automatically maintaining the availability of data despite server failures • If data are replicated at two or more failure-independent servers, then client software may be able to access data at an alternative server should the default server fail or become unreachable

Introduction • Replication • Fault tolerance • Highly available data is not necessarily providing correct data (may be out of date) • A fault-tolerant service always guarantees the correctness of the freshness of data supplied to the client and the effects of the client’s operations upon the data

Introduction • Replication • Replication requirements: • Transparency • Users should not need to be aware that data is replicated, and the performance and utility of the information retrieval should not be noticeably different from unreplicated data • Consistency • Different copies of replicated data should be the same. When data are changed, it is distributed to all replicated servers

System Model & The Role of Group Communication • Introduction • The data in the system are composed of objects (e.g.,files, components, Java objects, etc.) • Each logical object is implemented by a collection of physical objects called replicas, each stored on a computer. • The replicas of a given object are not necessarily identical, at least not at any particular point in time. Some replicas may have received updates that others have not received.

System Model & The Role of Group Communication • System Model

System Model & The Role of Group Communication • System Model • Replica Managers (RM) • components that contain the objects on a particular computer and perform operations on them. • Front ends (FE) • Components that handle client’s requests • communicate with one or more of the replica managers by message passing • A front end may be implemented in the client’s address space, or it may be a separate process

System Model & The Role of Group Communication • System Model • 5 phases in the a request upon replicated objects [Wiesmannet al. 2000] • Front end requests service from one or more RMs which may communicate with the other RMs. The front end may communicate through one RM or multicast to all of them. • RMs coordinate to prepare to execute the request. This may require ordering of the operations. • RMs execute the request (may be reversible later). • RMs reach agreement on effect of the request. • One or more RMs pass a response back to the front end.

System Model & The Role of Group Communication • The role of group communication • RM in group communication is complex, especially in the case of dynamic groups. • A group membership service may be used to manage the addition and removal of replica managers, and detect and recover from crashes and faults.

System Model & The Role of Group Communication • The role of group communication • Tasks of a Group Membership Service • Provide an interface for group membership changes • Implement a failure detector • Notify members of group membership changes • Perform group address expansion for multicast delivery of messages.

Group address expansion Leave Group send Multicast Group membership Fail communication management Join Process group System Model & The Role of Group Communication • The role of group communication

Fault Tolerant Services • Introduction • Replicating data and functionality at replica managers can be used to provide a service that is correct despite process failures • A replication service is correct if it keeps responding despite faults • Clients can’t see the difference between a service provided by replication and one with a single copy of the data.

Fault Tolerant Services • Introduction • A criteria for replicated objects is linearizable • Every operation is synchronous • Clients must wait for one operation to complete before starting another. • A replicated shared object is sequentially consistent if for any execution interleaved operations produce a single correct copy and the order of the operations is consistent with the order in which they were performed

Fault Tolerant Services • Update process • Read-only requests have no impact on the replicated object • Update processes may need to managed properly to avoid inconsistency. • A strategy to avoid inconsistency • Make all updates to a primary copy of the data and copy that to the other replicas (passive replication). • If the primary fails, one of the backups is promoted to act as primary.

Fault Tolerant Services • Passive (primary-backup) replication

Fault Tolerant Services • Passive (primary-backup) replication • The sequence of events when a client requests an operation • Request: front end issues a request with a unique identifier to the primary replica manager. • Coordination: primary processes request atomically, checking ID for duplicate requests. • Execution: request is processed and stored. • Agreement: if an update, primary sends info to backups, which update and acknowledge. • Response: primary notifies front end, which passes information to client.

Fault Tolerant Services • Passive (primary-backup) replication • It gives fault tolerance at a cost in performance. • high overhead to updating the replicas, so it gives lower performance than non-replicated objects. • To solve this issue: • Allow read-only requests to be made to backup RMs, but send all updates to the primary. • Limited value for transaction processing systems but is very effective for decision support systems (mostly read-only requests).

Fault Tolerant Services • Active Replication

Fault Tolerant Services • Active Replication • Active Replication steps: • Request: front end attaches unique ID to request and multicasts (totally ordered, reliable) to RMs. Front end is assumed to fail only by crashing. • Coordination: every correct RM receives request in same total order. • Execution: every RM executes the request. • Coordination: (not required due to multicast) • Response: each RM sends response to front end, which manages responses depending on failure assumptions and multicast algorithm.

Fault Tolerant Services • Active Replication • The model assumes totally ordered and reliable multicasting. • This is equivalent to solving consensus, which requires either a synchronous system or a technique such as failure detectors in an asynchronous system. • The model can be simplified if updates are assumed to be commutative, so that the effect of two operations is the same in any order. • E.g. A bank account—daily deposits and withdrawals can be done in any order unless the balance goes below zero. If a process avoids overdrafts, the effects are commutative.

Case study: Bayou and Coda • Introduction • Implementation of replication techniques to make services highly available • Giving clients access to the service (with reasonable response times) • Fault tolerant systems send updates and all correct RMs receive updates as soon as possible. • May be unacceptable for high availability systems. • May be desirable to increase performance by providing slower (but still acceptable) updates with a minimal set of RMs. • Weaker consistency tends to require less agreement and provides more availability.

Case study: Bayou and Coda • Bayou • Is an approach to high availability • Users working in a disconnected fashion can make any updates in any partition at any time, with the updates recorded at any replica manager. • The replica managers are required to detect and manage conflicts at the time when two partitions are rejoined and the updates are merged. • Domain specific policies, called operational transformations, are used to resolve conflicts by giving priority to some partitions.

Case study: Bayou and Coda • Bayou • Bayou holds state values in a database to support queries and updates. • Updates are a special case of a transaction, using the equivalent of a stored procedure to guarantee the ACID properties. • Eventually every RM gets the same set of updates and applies them so that their databases are identical. • However, since this is delayed, in an active system with a consistent stream of updates the databases may never really be identical.

Case study: Bayou and Coda • Bayou • Bayou Update Resolution • Updates are marked as tentative when they are first applied to a database. • Once coordination with the other RMS makes it possible to resolve conflicts and place the updates in a canonical order, they are committed. • Once committed, they remain applied in their allotted order. Usually, this is achieved by designating a primary RM. • Every update includes a dependency check and follows a merge procedure.

Case study: Bayou and Coda • Bayou

Case study: Bayou and Coda • Bayou • In Bayou, replication is not transparent to the application. • Knowledge of the application semantics is required to increase data availability while maintaining a replication state that can be called eventually sequentially consistent. • Disadvantages include increased complexity for the application programmers and the users. • The operational transformation approach is particularly suited for groupware, where workers access documents remotely.

Case study: Bayou and Coda • Coda • The Coda file system is a descendent of Andrew File System (AFS) • To address several requirements that AFS does not meet – particularly the requirement to provide high availability despite disconnected operation • It was developed in a research project at Carnegie-Mellon University • Increasing users of AFS that use laptop: • A need to support disconnected use of replicated data and to increase performance and availability.

Case study: Bayou and Coda • Coda • The Coda architecture: • Coda has Venus processes at the client computers and Vice processes at the file servers. • The Vice processes are replica managers. • A set of servers holding replicas of a file volume is a volume storage group (VSG). • Clients access a subset known as the available volume storage group (AVSG), which varies as servers are connected or disconnected. • Updates are distributed by broadcasting to the AVSG after a close. • If the AVSG is empty (disconnected operation) files are cached until reconnected.

Case study: Bayou and Coda • Coda • Coda uses an optimistic replication strategy • files can be updated when the network is partitioned or during disconnected operation. • A Coda version vector (CVV) is a timestamp that is used at each site to determine whether there are any conflicts among updates at the time of reconnection. • If no conflict, updates are performed. • Coda does not attempt to resolve conflicts. • If there is a conflict, the file is marked inoperable, and the owner of the file is notified. This is done at the AVSG level, so conflicts may recur at the VSG level.

Transaction with Replicated Data • Introduction • Client should see that transactions on replicated objects should appear the same as on non-replicated objects • Client transactions are interleaved in a serially equivalent manner. • One-copy serializability: • If replicated object transactions are performed and the result is the similar as on a single set of objects

Transaction with Replicated Data • Introduction • 3 replication schemes for network partition: • Available copies with validation • Available copies replication is applied in each partition. When a partition is repaired, a validation procedure is applied and any inconsistencies are dealt with. • Quorum consensus: • A subgroup must have a quorum (has sufficient members) in order to be allowed to continue providing a service in the presence of a partition. When a partition is repaired (and when a replica manager restarts after a failure), replica managers get their objects up-to-date by means of recovery procedures. • Virtual partition: • A combination of quorum consensus and available copies. If a virtual partition has a quorum, it can use available copies replication.

Transaction with Replicated Data • Available copies • Allows for some RMs to be unavailable. • Updates must be made to all available replicas of the data, with provisions to restore and update a RM that has crashed.

Transaction with Replicated Data • Available copies

Transaction with Replicated Data • Available copies with validation • An optimistic approach that allows updates in different partitions of a network. • When the partition is corrected, conflicts must be detected and compensating actions must be taken. • This approach is limited to situations in which such compensation is possible.

Transaction with Replicated Data • Quorum consensus • Is a pessimistic approach to replicated transactions. • A quorum is a subgroup of RMs that is large enough to give it the right to carry out transactions even if some RMs are not available. • This limits updates to a single subset of the RMs, which update other RMs after a partition is corrected. • Gifford’s File Replication: • a Quorum scheme in which a number of votes is assigned to each copy of a replicated file. • A certain number of votes are required for either read or update operations, with writes limited to subsets of more than half the RMs. • The rest of the RMs will be updated as a background task when they are available. • Copies of data without enough read votes are considered weak copies and may be read locally with limits assumed on their currency and quality.

Transaction with Replicated Data • Virtual Partition Algorithm • This approach combines Quorum Consensus to handle partitions and Available Copies for faster read operations. • A virtual partition is an abstraction of a real partition and contains a set of replica managers.

Transaction with Replicated Data • Virtual Partition Algorithm

Transaction with Replicated Data • Virtual Partition Algorithm • Issues: • If network partitions are intermittent, different virtual partitions can form: • Overlapping virtual partitions violate one-copy serializability. • Higher logical timestamps determine the selection of consistent virtual partitions where partitions are uncommon.

Transaction with Replicated Data • Virtual Partition Algorithm

End of the Chapter …

Fault Tolerance and Replication

Fault Tolerance and Replication

Presentation Transcript

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault tolerance

Fault tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance