1 / 37

Eventual Consistency

Eventual Consistency . Jinyang. Sequential consistency. Sequential consistency properties: Latest read must see latest write Handles caching All writes are applied in a single order Handles concurrent writes Realizing sequential consistency:

maida
Télécharger la présentation

Eventual Consistency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eventual Consistency Jinyang

  2. Sequential consistency • Sequential consistency properties: • Latest read must see latest write • Handles caching • All writes are applied in a single order • Handles concurrent writes • Realizing sequential consistency: • Reads/writes from a single node execute one at a time • All reads/writes to address X must be ordered by one memory/storage module responsible for X

  3. W(A)1 Invalidate, R(B) W(B)3 W(A)2 Realizing sequential consistency Cache or replica Cache Or replica

  4. Disadvantages of sequential consistency • Requires highly available connections • Lots of chatter between clients/servers • Not suitable for certain scenarios: • Disconnected clients (e.g. your laptop) • Apps might prefer potential inconsistency to loss of availability

  5. Why (not) eventual consistency? • Support disconnected operations • Better to read a stale value than nothing • Better to save writes somewhere than nothing • Potentially anomalous application behavior • Stale reads and conflicting writes…

  6. Sync w/ server resolves non-conflicting changes, reports conflicting ones to user W(A)1 W(A)2 No sync between clients Client writes to its local replica Operating w/o total connectivity replica replica

  7. Pair-wise synchronization Pair-wise sync resolves non-conflicting changes, reports conflicting ones to users W(B)3 replica W(A)1 W(A)2 replica replica

  8. Examples usages? • File synchronizers • One user, many gadgets

  9. File synchronizer • Goal • All replica contents eventually become identical • No lost updates • Do not replace new version with old ones

  10. Prevent lost updates • Detect if updates were sequential • If so, replace old version with new one • If not, detect conflict • “Optimistic” vs. “Pessimistic” • Eventual Consistency: Let updates happen, worry about whether they can be serialized later • Sequential Consistency: Updates cannot take effect unless they are serialized first

  11. W(f)b f 16679 W(f)c 15648 23657 How to prevent lost updates? • Strawman: use mtime to decide which version should replace the other • Problem w/ wallclock: cannot detect disagreement on ordering W(f)a H1 f mtime: 15648 f 12354 H2

  12. Strawman fix • Carry the entire modification history • If history X is a prefix of Y, Y is newer W(f)a W(f)b H1 H1:15648 H1:15648 H1:16679 W(f)c H1:15648 H1:15648 H2:23657

  13. H1:1 H1:2 H1:1 H1:2 H1:2 H2:1 Compress version history W(f)a W(f)b H1 H1:1 H1:1 H1:2 W(f)c H1:1 H1:1 H1:2 H2 H1:1 H1:2 H2:1 H1:2 implies H1:1, so we only need one number per host

  14. < Compare vector timestamp H1:1 H2:3 H3:2 H1:1 H2:5 H3:7 < H1:1 H2:3 H3:2 H1:2 H2:1 H3:7

  15. H1:2 H1:2 H2:1 Using vector timestamp W(f)a W(f)b H1 H1:1 H1:2 W(f)c H1:1 H1:1 H2:1 H2

  16. Using vector timestamp W(f)a W(f)b H1 H1:1 H1:2 W(f)c H1:1 H1:1 H2:1 H1:1 H2:1 H2

  17. How to deal w/ conflicts? • Easy: mailboxes w/ two different set of messages • Medium: changes to different lines of a C source file • Hard: changes to same line of a C source file • After conflict resolution, what should the vector timestamp be?

  18. What about file deletion? • Can we forget about the vector timestamp for deleted files? • Simple solution: treat deletion as a write • Conflicts involving a deleted file is easy • Downside: • Need to remember vector timestamp for deleted files indefinitely

  19. Tra [Cox, Josephson] • What are Tra’s novel properties? • Easy to compress storage of vector timestamps • No need to check every file’s version vector during sync • Allows partial sync of subtrees • No need to keep timestamp for deleted files forever

  20. Tra’s key technique • Two vector timestamps: • One represents modification time • Tracks what a host has • One represents synchronization time • Tracks what a host knows • Sync time implies no modification happens since mod time H1:1 H2:5 H3:7 H1:10 H2:20 H3:25

  21. H1:1 H1:0 H1:2 H1:0 f1 f1 f2 f2 H1:0 H2:0 H1:2 H2:0 H1:0 H2:0 H1:2 H2:0 Using sync time W(f1)a W(f2)b H1 H1:1 H1:2 f1 f2 H1:1 H2:0 H1:2 H2:0 H2

  22. Compress mtime and synctime • dir synctime = element-wise min of child sync times • dir mtime = element-wise max of child mod times • Sync(d1d1’) • Skip d1 if mtime of d1 is less than synctime of d1’ • Can we achieve this with single mtime? • Skip d1 if mtime of d1 is less than mtime of d1’

  23. Synctime enables partial synchronization • Directory d1 contains f1 and f2, suppose host sync a subtree (d1/f1) • With synctime+mtime: synctime of d1 does not change. Mtime of d1 increases • With mtime only: Mtime of d1 increases • Host later syncs subtree d1/f2 • With synctime+mtime: will pull in modifications in e2 because synctime of d1 is smaller • With mtime only: skips d1 because mtime is high enough

  24. H1:0 H1:1 H1:0 f1 f1 f2 H1:2 H2:0 H1:0 H2:0 H1:0 H2:0 H1:2 H1:2 d d H1:0 H2:0 H1:0 H2:0 Using sync time W(f1)a W(f2)b H1 f1 H1:1 f2 H1:2 H1:2 Sync f1 only d Sync f2 only H1:2 H2:0 H1:1 H1:2 f1 f2 H2 H1:2 d H1:2 H2:0

  25. H1:1 f1 H1:2 H1:0 d d H1:2 H2:0 H1:0 H2:0 How to deal w/ deletion Deletion notice for a deleted file contains its sync time W(f1)a D(f2) H1 f1 H1:1 f2 H1:2 H2:0 H1:2 d H1:2 H2:0 H1:0 H1:0 f2 f1 H2

  26. H2:1 f2 H1:1 f1 H1:2 H1:0 d d H1:2 H2:1 H1:0 H2:1 How to deal w/ deletion Deletion notice for a deleted file contains its sync time W(f1)a D(f2) H1 f1 H1:1 f2 H1:2 H2:0 H1:2 d H1:2 H2:0 H1:0 H2 H2:1 f2 f1

  27. Another definition of eventual consistency • Eventual consistency (Tra) • All replica contents are eventually identical • Do not care about individual writes, just overwrite old replica w/ new one • Eventual consistency (Bayou) • Writes are eventually applied in total order • Reads might not see most recent writes in total order

  28. Bayou Write log 0:0 1:0 2:0 Version Vector N1 0:0 1:0 2:0 N0 0:0 1:0 2:0 N2

  29. 1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 Bayou propagation Write log 1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2

  30. 0:3 1:4 2:0 1:1 W(x) Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2

  31. Which portion of The log is stable? Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 N0 0:0 1:0 2:0 N2

  32. Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 N0 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:5 N2

  33. Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:6 2:5 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 0:3 1:4 2:5 N0 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:5 N2

  34. Bayou uses a primary to commit a total order • Why is it important to make log stable? • Stable writes can be committed • Stable portion of the log can be truncated • Problem: If any node is offline, the stable portion of all logs stops growing • Bayou’s solution: • A designated primary defines a total commit order • Primary assigns CSNs (commit-seq-no) • Any write with a known CSN is stable • All stable writes are ordered before tentative writes

  35. ∞:1:1 W(x) 0:0 1:1 2:0 Bayou propagation Write log ∞:1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2

  36. 1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 4:1:1 W(x) 0:4 1:1 2:0 Bayou propagation Write log ∞:1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 0:4 1:1 2:0 N0 4:1:1 W(x) 0:0 1:0 2:0 N2

  37. Bayou’s limitations • Primary cannot fail • Server creation & retirement makes nodeID grow arbitrarily long • Anomalous behaviors for apps? • Calendar app

More Related