Consistency and Replication

Consistency and Replication Chapter 6 Part II Distribution Models & Consistency Protocols

Distribution Models • How do we propagate updates between replicas? • Replica Placement: • Where is a replica placed? • When is a replica created? • Who creates the replica?

Types of Replicas • The logical organization of different kinds of copies of a data store into three concentric rings.

Permanent Replicas • Initial set of replicas • Other replicas can be created from them • Small and static set • Example: Web site horizontal distribution • Replicate Web site on a limited number of machines on a LAN • Distribute request in round-robin • Replicate Web site on a limited number of machines on a WAN (mirroring) • Clients choose which sites to talk to

Server-Initiated Replicas (1) • Dynamically created at the request of the owner of the DS • Example: push-caches • Web hosting servers can dynamically create replicas close to the demanding client • Need dynamic policy to create and delete replicas • One is to keep track of Web page hits • Keep a counter and access-origin list for each page

Server-Initiated Replicas (2) • Counting access requests from different clients.

Server-Initiated Replicas (3) • Each server can determine which is the closest server to a client • Hits from clients who have the same closest server will go to the same counter • cntQ(P,F) = count of hits to replica F hosted at server Q, initiated by P’s clients • If cntQ(P,F) drops below a threshold D, delete replica F from Q

Server-Initiated Replicas (4) • If cntQ(P,F) drops below a threshold R, replicate or migrate F to P • Choose D and R with care • Do not allow deletion on permanent-replicas • If for some server P, cntQ(P,F) exceeds more than half the total requests for F, migrate F from Q to P • What if it cannot be migrated to P?

Client-Initiated Replicas • These are caches • Temporary storage (expire fast) • Manage by clients • Cache hit: return data from cache • Cache miss: load copy from original server • Kept on the client machine, or at the same LAN • Multi-level caches

Design Issues for Update Propagation • A client sends an update to a replica. The update is then propagated to all other replicas • Propagate state or operation • Pull or Push protocols • Unicast or multicast propagation

State versus Operation Propagation (1) • Propagate a notification of update • Invalidate protocols • When data item x is changed at a replica, it is invalidated at other replicas • An attempt to read the item causes an “item-fault” triggering updating the local copy before the read can complete • the update is done based on the consistency model supported • Uses little network bandwidth • Good when read-to-write ratio is low

State versus Operation Propagation (2) • Transfer modified data • Uses high network bandwidth • Good when read-to-write ratio is high • Propagate the update operation • Each replica must have a process capable of performing the update • Uses very low network bandwidth • Good when read-to-write ratio is high

Pull versus Push • Push: updates are propagated to other replicas without solicitation • Typically, from permanent to server-initiated replicas • Used to achieve a high degree of consistency • Efficient when read-to-write ratio is high • Pull: A replica asks another for an update • Typically, from client-initiated replica • Efficient when read-to-write ratio is low • Cache misses result in more response time

Pull versus Push Protocols • A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Unicast versus Multicast • Unicast: a replica sends separate n-1 updates to every other replica • Multicast: a replica sends one message to multiple receivers • Network takes care of multicasting • Can be cheaper • Suites push-based protocols

Consistency Protocols • Is an implementation of a specific consistency model • Consider an implementation where each server (site) has a complete copy of the data store • The data store is fully replicated • Will present simple protocols that implement P-RAM, CC, and SC • Exercise: suggest protocols for WC and Coherence

Consistency Protocols – Design Issues • Primary replica protocols • Each data item has a primary replica, an owner, denoted x.prime • x is not replicated, it lives at site x.prime • Primary replica, remote write protocol • All writes to x are sent to x.prime • Primary replica, local write protocol • Each write is performed locally • x migrates (x.prime changes) • A process gets exclusive permission to write from x.prime

Design Issues – Distribution p q s t X, Y, Z A, B, C E, F, G U, V, W Only p has X. All operations on X must be forwarded to p X can migrate

Design Issues – Primary Fixed Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Only p can write to A. All writes to A must be forwarded to p Others can read A locally

Design Issues – Primary Variable Replica p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D p is the current Owner of A. All writes to A must be forwarded to p Ownership can change

Design Issues – Free Replication p q s t A, B, C, D A, B, C, D A, B, C, D A, B, C, D Any process can Write to A Writes can be propagated

System Model (1) p q s t Replica DSp Replica DSq Replica DSs Replica DSt

System Model (2) • Each process (n processes) maintains: • DSp, a complete replica of DS • A data item x in replica DSp is denoted by DSp.x • OutQp, FIFO queue for outgoing messages • InQp, queue for incoming messages (need not be FIFO) • Sending/receiving (removing from OutQ/adding to InQ) messages is handled by independent daemons

Sending and Receiving Messages • sendp(): /* executed infinitely often */ • If not OutQp.empty() then • remove message m from OutQp • send m to each q  p • receivep(): • upon receiving a message m from q • add m to InQp • Assume each m contains exactly one write • Better: send a bunch of writes in one message

(P-RAM)P-RAM Protocol

P-RAM Protocol - (P-RAM) Messages • Messages have the format: • <write,p,x,v>, p = process id, x data item, v is a value: • Representing writep(x)v • InQp is a FIFO queue, for each process p • Let Tp be a sequence of operations for process p

P-RAM Protocol - (P-RAM) Read and Write Implementation • readp(x)?: • return DSp.x • /* add rp(x)* to Tp */ • writep(x)v: • DSp.x = v • add <write,p,x,v> to OutQp • /* add wp(x)v to Tp */

P-RAM Protocol - (P-RAM) Apply Implementation • applyp(): /* to apply remote writes locally */ • /* executed infinitely often */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • /* Add wq(x)v to Tp */

(P-RAM) – Correctness • Claim: any computation of (P-RAM) satisfies P-RAM • Proof: • Let (O|p  O|w, <p) be Tp for each process p • Clearly, program order is maintained, Tp is valid, and contains all of O|p  O|w

(CC)CC Protocol

CC Protocol – (CC) • Need an additional variable for each process p: • vtp[n], a vector timestamp for write operations • If vtp[q] = k, then p thinks q has performed k writes • Messages now have the structure: • <write,q,x,v,t> representing writeq(x)v with vector timestamp t • InQp is a priority queue based on vt values • Add a new message, • in front of all the messages with higher vt • behind the messages that have incomparable vt

(CC) – Read and Write Implementation • readp(x)?: • return DSp.x • (read operation has timestamp of vtp) • /* Add rp(x)* to Tp */ • writep(x)v: • vtp[p]++ /* One more write by p*/ • DSp.x = v • add <write,p,x,v,vtp> to OutQp • (write operation has timestamp of vtp) • /* Add wp(x)v to Tp */

(CC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v,t> be at the head of InQp • if t[k]  vtp[k] for each k  p • and t[q] == vtp[q] + 1 then • remove <write,q,x,v,t> from InQp • vtp[q] = t[q] • DSp.x = v • (write operation has timestamp of t) • /* Add wq(x)v to Tp */

if t[k]  vtp[k] for each k  p and t[q] == vtp[q] + 1 then • Not (t[k]  vtp[k]) for each k  p • There is a j such that t[j] > vtp[j] • There is at least one write by j that q knows of, but p does not • If so, p must wait • t[q] == vtp[q] + 1 • To apply this write of q with vt = t[q] = x, it must be the case that the last write by q that p applied has a vt = x – 1

(CC) – Timestamps Reflect Causality • Observation 1: if o1 <co o2 then o1.vt  o2.vt • If o1 <po o2 , processes never decrement vt • If o1 <wbr o2 , after o1 is applied by p then vtp = o1.vt. Any subsequent operation by p will have a timestamp of at least vtp • Transitivity, there is some operation o’ such that o1 <co o’ <co o2. By induction o1.vt  o’.vt  o2.vt  is transitive, thus o1.vt  o2.vt • Observation 2: if o1 <co o2 and o2 is a write then o1.vt < o2.vt • Proof similar to Observation 1

(CC) – Liveness (1) • Observation 3: If w is a write by p then each q eventually applies w • If q = p, trivial • If q  p, then w is either: • In OutQp: will be eventually sent (send is executed infinitely often) • In transit: will be delivered to q (channels are reliable) • In InQq: need to argue that it will eventually be applied • Applied by q: we’re done

(CC) – Liveness (2) • Any w will eventually be applied by q • When q adds w to InQq, InQq has a finite number of writes • After w is added to InQq, more writes with lower vt can be added • These is a finite number of such writes • w eventually reaches the head of InQq

(CC) – Liveness (3) • A write w by p at the head of InQq is eventually applied by q • Let vt be vtq when m is at the head of InQq • We must show that vt[k]  w.vt[k] for all k  p and vt[p] == w.vt[p] + 1

(CC) – Liveness (4) vt[k]  w.vt[k] for all k  p • Let k be a process other than p • Let w’ by the w.vt[k]th write by k • p applies w’ before it performs w • This implies that w’.vt < w.vt • Thus, w’ is ahead of w in InQq • When w is at the head of InQq, w’ is already applied • Once q applies w’, vt[k]  w.vt[k]

(CC) – Liveness (5) vt[p] + 1 = w.vt[p] • Let w’’ be the (w.vt[p] – 1)st write by p • By observation 1, w’’.vt  w.vt • Thus, w’’ is ahead of w in InQp • w’’ must have been already applied • Therefore, vt[p] = w.vt[p] – 1

(CC) – Correctness (1) • Claim: any computation of (CC) satisfies CC • Proof: • Let (O|p  O|w, <p) be Tp for each process p • By Observation 3, Tp contains all of O|p  O|w • By memory property, Tp is valid • Tp is acyclic (exercise) • Need to show that (O|p  O|w, <co)  Tp

(CC) – Correctness (2)(O|p  O|w, <co)  Tp • Let o1 and o2 be two operations such that o1 <co o2; we show that o1 precedes o2 in Tp [let q  p] • By Observations 1, o1.vt  o2.vt • o1,o2  O|p: trivial from (CC) • o1  O|q|w & o2  O|p: p sets its vt[q] = o1.ts[q] after applying o1. Since o2.vt[q]  o1.vt[q], o2 can only occur after o1 • o1  O|p|w & o2  O|q|w: o1.vt  o2.vt => o1.vt[p]  o2.vt[p] => q applies o1 before o2. p applies o2 after q and o1 before q => p applies o1 before o2 • o1  O|p|r & o2  O|q|w: it must be o1 <co o’ <co o2, where o’ is a write. Cases 1 and 3 apply • o1,o2  O|q|w: exercise (Hint: Observation 2 => o1.vt < o2.vt)

(SC)SC Protocol

SC Protocol – (SC) • Arrange process on a logical ring • Circulate a special message (token) <token> • Introduce another message type <ack,p,x,v,t> • Ack to p that w(x)v of ts t has been applied • Protocol uses the token-ring to apply writes in a mutual exclusive manner • Another way is to enforce a total order on all read and write operations using timestamps (exercise)

(SC) – Read and Write Implementation • readp(x)?: • return DSp.x • writep(x)v: • wait for <token> • DSp.x = v • add <write,p,x,v> to OutQp • wait for <ack,p,x,v> from each q • add <token> OutQp

(SC) – Apply Implementation • applyp(): /* to apply remote writes locally */ • If not InQp.empty() then • let <write,q,x,v> be at the head of InQp • remove <write,q,x,v> from InQp • DSp.x = v • add <ack,q,x,v> to OutQp

(SC) – Correctness • Claim: any computation of (SC) satisfies SC • Note that any computation of (SC) satisfies P-RAM

(SC) Provides SC w3 w5 w1 (O|q  O|w ,<p) (O,<) w3 w3 w5 w5 w1 w1 (O|w,<w) w3 w5 w1 (O|p  O|w ,<p) • All processes agree on one order of writes (O|w, <w) • (O|w,<p) = (O|w,<q) = (O|w,<w), for each pair p,q • To get (O,<) s.t. (O,<po)  (O,<):

Consistency and Replication