Consistency & Replication Chapter No. 6

Consistency & ReplicationChapter No. 6 Muhammad Rafi Meng Min Guan Donghai

Motivation • Large Scale distributed systems required scalability in every respect of their function. • Data are generally replicated to enhance reliability, improve performance and increase availability • One major problem of replication is to maintain consistency • To make consistency we need to have • Distribution of updates (amount, frequency, means) • Keeping the replica consistent (immediate updates is required) • We will discuss a variety of protocols for data and client centric consistency • We also discuss the distribution protocols and consistency protocols • Example • Simulation

Outline • Motivation Rafi • Introduction Meng Min • Data-Centric Consistency Models Meng Min • Client-Centric Consistency Models Donghai • Distribution Protocols Donghai • Consistency Protocols Rafi • Examples Rafi • Simulation of Consistency technique Rafi

Content Definition of Consistency and Replication 6.1 Introduction Reasons for Replication & Problems of Replication Object Replication 6.2Data-centric consistency models

Consistency and Replication Definition : • Replication: • Replication of data • Consistency: • Consistency of replicated data Reason for Consistency: • Keep replicas to be the same

Consistency and Replication 6.1 Introduction: • Reasons for Replication: • Reliability • Performance • Problem of Replication: • Consistency Problem Whenever a replica is updated, that replica becomes different from the others.  SynchronizationProblem Performance Replication Consistency Synchronization

Object Replication (1) • Purpose: managing data in distributedsystems • Object Replication: • Consider objects instead of data alone • Benefit of encapsulating and operating data

Object Replication (2) Organization of a distributed remote object shared by two different clients.

Object Replication (3) • A remote object capable of handling concurrent invocations on its own. • A remote object for which an object adapter is required to handle concurrent invocations

Data-centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. • Consistency Model: • A contract between processes and the data store.

Data-centric Consistency Models • Data-centric Consistency Models • Strict Consistency Strict • Linearizability and Sequential Consistency • Causal Consistency • FIFO Consistency • Weak Consistency • Release Consistency • Entry Consistency Weak

Strict Consistency Condition: Any read on a data item x returns a value corresponding to the result of the most recent write on x. Disadvantage: Assume the existence of absolute global time. Behavior of two processes, operating on the same data item. • A strictly consistent store. • A store that is not strictly consistent.

Linearizability and Sequential Consistency • Sequential Consistency Condition: The result of execution is the same as if the operations by all processes were executed in some sequential order and the operations appear in the order specified by its program. Time does not play a role. • A sequentially consistent data store. • A data store that is not sequentially consistent. Linearizable consistency condition: Sequential + if ts1(x)<ts2(y), OP1(x) should precede OP2(y) in this sequence.

Causal Consistency (1) • Necessary condition:Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. This sequence is allowed with a causally-consistent store, but not with sequentially or strictly consistent store.

Causal Consistency (2) • A violation of a casually-consistent store. • A correct sequence of events in a casually-consistent store.

FIFO Consistency • Necessary Condition:Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. A valid sequence of events for FIFO consistency

Weak Consistency (1) Properties: • Accesses to synchronization variables associated with a data store are sequentially consistent • No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere • No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.

Weak Consistency (2) • A valid sequence of events for weak consistency. • An invalid sequence for weak consistency.

Release Consistency (1) Rules: • Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully. • Before a release is allowed to be performed, all previous reads and writes by the process must have completed • Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).

Release Consistency (2) A valid event sequence for release consistency.

Entry Consistency (1) Conditions: • An acquire access of a synchronization variable is not allowed to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process. • Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode. • After an exclusive mode access to a synchronization variable has been performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner.

Entry Consistency (2) A valid event sequence for entry consistency

Summary of Consistency Models

6. Consistency and Replication6.3 Client-Centric Consistency models6.4 Distribution Protocols Guan Donghai 2005-11-15

Client-Centric Consistency Models • The previously studied consistency models concern themselves with maintaining a consistent (globally accessible) data-store in the presence of concurrent read/write operations. • Another class of distributed datastore is that which is characterized by the lack of simultaneous updates. Here, the emphasis is more on maintaining a consistent view of things for the individual client processthat is currently operating on the data-store.

More Client-Centric Consistency How fast should updates (writes) be made available to read-only processes? • Think of most database systems: mainly read. • Think of the DNS: write-write conflicts do no occur. • Think of WWW: as with DNS, except that heavy use of client-side caching is present: even the return of stale pages is acceptable to most users. • These systems all exhibit a high degree of acceptable inconsistency … with the replicas gradually become consistent over time.

Toward Eventual Consistency • The only requirement is that all replicas will eventually be the same. • All updates must be guaranteed to propogate to all replicas … eventually! • This works well if every client always updates the same replica. • Things are a little difficult if the clients are mobile.

Eventual Consistency A mobile user accessing different replicas of a distributed database has problems with eventual consistency

Client-Centric Consistency • Four models in client-centric consistency: – Monotonic Read Consistency – Monotonic Write Consistency – Read-your-writes Consistency – Writes-follows-reads Consistency

Client-Centric Consistency •xi[t] -version of data x at copy Liat time t •xi[t] is result of a set of write operations applied to x since initialization •this set is notated as WS(xi[t]) •When these operations later (t2) are performed to x at copy Lj ,it is written as WS(xi[t1]; xj[t2])

Monotonic Reads If a process reads the value of a data item x, any successive read operationson x by that process will always return that same value or a more recentvalue. • A monotonic-read consistent data store • A data store that does not provide monotonic reads. • Example: distributed email database

Monotonic Writes A write operation by a process on a data item x is completed before anysuccessive write operation on x by the same process a) A monotonic-write consistent data store. b) A data store that does not provide monotonic-write consistency. Example: The MW-guarantee could be used by a text editor when editing replicated files

Read Your Writes The effect of a write operation by a process on data item x will always be seen a successive read operation on x by the same process. • A data store that provides read-your-writes consistency. • A data store that does not. • Example: updating passwords

Writes Follow Reads A write operation by a process on a data item x following a previous read operation on x by the same process, it is guaranteed to take place on the same or a more recent value of x that was read. • A writes-follow-reads consistent data store • A data store that does not provide writes-follow-reads consistency • Example: replicated bulletin board database

Client centric model-Summary • Monotonic read If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value • Monotonic write A write operation by a process on a data item x is completed before any successive write operation on x by the same process • Read your writes The effect of a write operation by a process on a data item x will always be seen by a successive read operation on x by the same process • Writes follow reads A write operation by a process on a data item x following a previous read operation on x by the same process, is garanteed to take place on the same or more recent values of x that was read

Distribution Protocols • Purpose: Solve the following problem – What is exactly propagated? – Where updates are propagated? – By whom propagation is initiated? •Three Distribution Protocols – Replica Placement – Update Propagation – Epidemic Protocols

Replica Placement The logical organization of different kinds of copies of a data store into three concentric rings.

Permanent Replicas • Permanent Replicas – Initial set of replicas that constitutes a distributed data store – Typically, the number of it is small •Example: Web site

Server-Initiated Replicas •Server-Initiated Replicas – Created at the initiative of the owner of the data store – Exist to enhance performance •Work Scheme How to decide where and when replicas should be created or deleted? – Each server keeps track of access counts per file, and where access requests come from. – When a server Q decides to reevaluate the placement of the files it stores, it checks the access count for each file. – If the total number of access requests for F at Q drops below the deletion threshold del (Q,F), it will delete F unless it is the last copy.

Server-Initiated Replicas Counting access requests from different clients.

Client-Initiated Replicas • Work Scheme –When a client wants access to some data, it connects to the nearest copy of the data store from where it fetches the data it wants to read, –When most operations involve only reading data, performance can be improved by letting the client store requested data in a nearby cache. –The next time that same data needs to be read, the client can simply fetch it from this local cache. • Client-Initiated Replicas –Created at the initiative of clients –Commonly known as client caches

Update Propagation •Introduction – Generally initiated at a client – Subsequently forwarded to one of the copies •Three design issues – State Vs. Operations What is actually to be propagated – Pull Vs. Push Protocols Whether updates are pulled or pushed – Unicasting Vs. Multicasting Whether unicasting or multicasting should be used

State Vs. Operations What is actually to be propagated? • Propagate only a notification of an update – Use little network bandwidth – Work best when read-to-write ratio is relatively small • Transfer data from one copy to another – Useful when the read-to-write ratio is relatively high • Propagate the update operation to other copies – Tell each replica swhich update operation it should perform – Updates can often be propagated at minimal bandwidth costs

Pull versus Push Protocols Whether updates are pulled or pushed? • Push-based approach (server-based) • Updates are propagated to other replicas without those replicas even asking for the updates • Pull-based approach (client-based) • a server or client requests another server to send it any updates it has at that moment. A comparison between push-based and pull-based Protocols in the case of multiple client, single server systems.

Unicasting Vs. Multicasting Whether unicasting or multicasting should be used? • Introduction – With multicasting, the underlying network takes care of sending a message efficiently to multiple receivers. – In unicast communication, when a server that is part of the data store sends its update to N other servers, it does so by sending N separate messages, one to each server. • Compare – In many cases, it is cheaper to use available multicasting facilities. – Multicasting can often be efficiently combined with a push-based approach to propagating updates.

Epidemic Protocols • Ensures eventual consistency • Epidemic spreading of updates, propagates updates to all replicas efficiently • Does not solve any update conflicts directly • Concern: update to all replicas in as few messages as possible • Terminology: – Infective – a server that holds an update and it is willing to spread – Susceptible – a server that has not been updated – Removed – a server that is not willing or able to update

Epidemic Protocols • Anti-entropy Propagation Approach: – A server P picks another server Q at random, three approaches in exchanging updates: 1. P only pushes its updates to Q 2. P only pulls in new updates from Q 3. P and Q exchange updates (push-pull) • Gossip Protocol: – If server P has just been updated for data item x, • Contact an arbitrary server Q and push update to Q • If Q was already updated, P may lose interest with some probability – Good way of rapidly spreading updates – Note: cannot guarantee that all servers will actually be updated

Consistency & ReplicationChapter No. 6Muhammad Rafi 6.5 Consistency Protocol 6.6 Examples Orca

Consistency Protocol • A consistency protocol describes an implementation of a specific consistency model • The consistency model in which operations are globally serialized are the most important and widely applied models • Sequential Consistency • Weak consistency with synchronization variables • Atomic transaction • There are two ways to classify the consistency protocols, one with the primary or one with write on any replica • We discussed the following protocols Primary-Based Protocols • Remote-Write Protocols & Local-Write Protocols Replicated-Write Protocols • Active Replication & Quorum-Based Protocols Cache-coherence Protocols

Primary-Based Protocols • Primary based protocols use a primary server to responsible for every write operation in distributed data sources. • There are several strategies for Primary based approach • Remote-write protocol with no backup replica • Remote-write protocol: primary-backup approach (passive replication) • Local-write protocol with single migrating primary copy (no backup) • Local-write protocol with migrating primary copy and non-migrating backups (Primary-backup approach)

Consistency & Replication Chapter No. 6