1 / 67

Strong Eventual Consistency for Conflict-Free Replicated Data

Explore Strong Eventual Consistency theory and CRDTs for conflict-free replicated data types in distributed systems. Understand the principles, challenges, and practical solutions for achieving consistency. Presented by Ron Zisman, Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski.

eslater
Télécharger la présentation

Strong Eventual Consistency for Conflict-Free Replicated Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Conflict-free Replicated Data Types Presented by: Ron Zisman Marc Shapiro, NunoPreguiça, Carlos Baquero and MarekZawirski

  2. Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing • Lots of replicas • Great for fault-tolerance and read latency • Problematic when updates occur • Slow synchronization • Conflicts in case of no synchronization Motivation

  3. We look for an approach that: • supports Replication • guaranteesEventual Consistency • isFastand Simple • Conflict-free objects = no synchronization whatsoever • Is this practical? Motivation

  4. Contributions Theory Strong Eventual Consistency (SEC) • A solution to the CAP problem • Formal definitions • Two sufficient conditions • Strong equivalence between the two • Incomparable to sequential consistency Practice CRDTs = Convergent or Commutative Replicated Data Types • Counters • Set • Directed graph

  5. Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

  6. Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

  7. Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

  8. Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

  9. Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

  10. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  11. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  12. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  13. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  14. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  15. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

  16. Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex Reconcile

  17. Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

  18. Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

  19. Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

  20. Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

  21. Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

  22. Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state • Doesn’t preclude roll backs and reconciling Definition of EC

  23. Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Strong Convergence: Correct replicas that have delivered the same updates haveequivalent state Definition of SEC

  24. System model System of non-byzantine processes interconnected by an asynchronous network Partition-tolerance and recovery What are the two simple conditions that guarantee strong convergence?

  25. Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects

  26. Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects

  27. Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects

  28. An object is a tuple • Local queries, local updates • Send full state; on receive, merge • Update is said ‘delivered’ at some replica when it is included in its casual history • Causal History: State-based approach payload set merge initial state update query

  29. State-based replication Causal History: • on query: • on update: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload

  30. State-based replication Causal History: • on query: • on update: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload

  31. State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads

  32. State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads

  33. State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads

  34. A poset is a join-semilattice if for all x,y in S a LUB exists LUB = Least Upper Bound • Associative: • Commutative: • Idempotent: Examples: Semi-lattice

  35. State-based: monotonic semi-lattice CvRDT If: then replicas converge to LUB of last values • payload type forms a semi-lattice • updates are increasing • merge computes Least Upper Bound

  36. An object is a tuple • prepare-update • Precondition at source • 1st phase: at source, synchronous, no side effects • effect-update • Precondition against downstream state (P) • 2nd phase, asynchronous, side-effects to downstream state Operation-based approach payload set delivery precondition effect-update initial state prepare-update query

  37. Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas Causal History: • on query/prepare-update:

  38. Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas • Eventually, at all replicas: • Downstream precondition • Assign local replica Causal History: • on query/prepare-update: • on effect-update:

  39. Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas • Eventually, at all replicas: • Downstream precondition • Assign local replica Causal History: • on query/prepare-update: • on effect-update:

  40. Op-based: commute CmRDT • Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true • Safety: concurrent operations all commute If: then replicas converge

  41. Monotonic semi-latticeCommutative A state-based object can emulate an operation-based object, and vice-versa Use state-based reasoning and then covert to operation based for better efficiency

  42. Comparison State-based Operation-based Update operation Higher level, more complex More powerful, more constraining Small messages Collaborative editing (Treedoc), Bayou, PNUTS • Update ≠ merge operation • Simple data types • State includes preceding updates; no separate historical information • Inefficient if payload is large • File systems (NFS, Dynamo) State-based or op-based, as convenient

  43. There is a SEC object that is not sequentially-consistent: Consider a Set CRDT S with operations add(e) and remove(e) • remove(e) → add(e) e ∈ S • add(e) ║ remove(e’)e ∈ S ∧ e’ S • add(e) ║ remove(e)e ∈ S (suppose add wins) Consider the following scenario with replicas , : • [add(e); remove(e’)] ║ [add(e’); remove(e)] • merges the states from and : e ∈ S ∧ e’∈ S The state of replica will never occur in a sequentially-consistent execution (either remove(e) or remove(e’) must be last) SEC is incomparable to sequential consistency

  44. There is a sequentially-consistent object that is not SEC: • If no crashes occur, a sequentially-consistent object is SEC • Generally, sequential consistency requires consensus to determine the single order of operations – cannot be solved if n-1 crashes occur (while SEC can tolerate n-1 crashes) SEC is incomparable to sequential consistency

  45. Example CRDTs Multi-master counter Observed-Remove Set Directed Graph

  46. Increment • Payload: • Partial order: • value() = • increment() = ++ • merge(x,y) = = Multi-master counter

  47. Increment / Decrement • Payload: , • Partial order: • value() = - • increment() = ++ • decrement() = ++ • merge(x,y) = = ( ) Multi-master counter

  48. Sequential specification: • {true} add(e){e ∈ S} • {true} remove(e){e ∈ S} • Concurrent: {true} add(e) ║remove(e) {???} • linearizable? • error state? • last writer wins? • add wins? • remove wins? Set design alternatives

  49. Observed-Remove Set

  50. Observed-Remove Set • Payload: added, removed (element, unique token) • add(e) =

More Related