1 / 65

Distributed Shared Memory for Large-Scale Dynamic Systems

Distributed Shared Memory for Large-Scale Dynamic Systems. Vincent Gramoli supervised by Michel Raynal. My Thesis. Implementing a distributed shared memory for large-scale dynamic systems. My Thesis. Implementing a distributed shared memory for large-scale dynamic systems is

sef
Télécharger la présentation

Distributed Shared Memory for Large-Scale Dynamic Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Shared Memoryfor Large-Scale Dynamic Systems Vincent Gramoli supervised by Michel Raynal Vincent Gramoli

  2. My Thesis Implementing a distributed shared memory for large-scale dynamic systems Vincent Gramoli

  3. My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, Vincent Gramoli

  4. My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT, Vincent Gramoli

  5. My Thesis Implementing a distributed shared memory for large-scale dynamic systems is NECESSARY, DIFFICULT, DOABLE! Vincent Gramoli

  6. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  7. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  8. Distributed Systems Enlarge • Internet explosion IPv4 -> IPv6 • Multiplication of personal devices • 17 billions of network devices by 2012 (IDC prediction) Internet Vincent Gramoli

  9. Distributed Systems are Dynamic Independent computational entities act asynchronously, and are affected by unpredictable events (join/leaving). These sporadic activities make the system dynamic Vincent Gramoli

  10. Massively Accessed Applications WebServices use large information • eBay: Auctioning service • Wikipedia: Collaborative encyclopedia • LastMinute: Booking application …but require too muchpower supply and cost too much increase (auction) modify (article) reserve (tickets) Vincent Gramoli

  11. Massively Distributed Applications Peer-to-Peer applications share resources • BitTorrent: File Sharing • Skype: Voice over IP • Joost: Video Streaming …but prevent large-scale collaboration. copy exchange create Vincent Gramoli

  12. Filling the Gap is Necessary Providing distributed applications where entities (nodes) can fully collaborate • P2Pedia: using P2P to built a collaborative encyclopedia • P2P eBay: using P2P as an auctioning service Vincent Gramoli

  13. There are 2 Ways of Colaborating • Using a Shared Memory • A node writes information in the memory • Another node reads information from the memory • Using Message Passing • A node sends a message to another node • The second node receives the message from the other Memory Read v Write v Node 1 Node 2 Node 3 Node 1 Send v Recv v Node 2 Node 3 Vincent Gramoli

  14. Shared Memory is Easier to Use • Shared Memory is easy to use • If information is written, collaboration progresses! • Message Passing is difficult to use • To which node the information should be sent? Vincent Gramoli

  15. Message Passing Tolerates Failures • Shared Memory is failure-prone • Communication relies on memory availability • Message-Passing is fault-tolerant • As long as there is a way to route a message Memory Read v Write v Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Send v Recv v Vincent Gramoli

  16. The Best of the 2 Ways • Distributed Shared Memory (DSM) • emulates a Shared Memory to provide simplicity, • in the Message Passingmodel to tolerate failures. DSM read / write(v) operations read-ack(v) / write-ack Vincent Gramoli

  17. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  18. Our DSM Consistency:Atomicity Atomicity (Linearizability) defines an operationordering: • If an operation ends before another starts, then it can not be ordered after • Write operations are totally ordered and read operations are ordered with respect to write operations • A read returns the last value written (or the default one if none exist) Vincent Gramoli

  19. Quorum-based DSM Sharing memory robustly in message-passing systems H. Attiya, A. Bar-Noy, D. Dolev, JACM1995 • Quorums: mutually intersecting sets of nodes Ex. 3 quorums of size q=2, with memory size m=3 Q1 ∩ Q2 ≠ Ø Q1 ∩ Q3 ≠ Ø Q2 ∩ Q3 ≠ Ø Q1 Q2 Q3 • Each node of the quorums maintains: • A local value v of the object • A unique tag t, the version number of this value Vincent Gramoli

  20. Quorum-based DSM • Read and write operations • A node ireads the object value vk by • Asking vj and tj to each node j of a quorum • Choosing the value vk with the largest tag tk • Replicating vk and tk to all nodes of a quorum • A node iwrites a new object value vn by • Asking tj to each node j of a quorum • Choosing a larger tn than any tj returned • Replicating vn and tnto all nodes of a quorum Get <vk,tk> Set <vk,tk> Get <vk,tk> tn = tk++ Set <vn,tn> Vincent Gramoli

  21. Quorum-based DSM • Reading a value Q1 Q2 Q3 value? tag? v1,t1 Vincent Gramoli

  22. Quorum-based DSM • Reading a value Q1 Q2 Q3 v1,t1 Vincent Gramoli

  23. Quorum-based DSM • Reading a value Q1 Q2 Q3 Output: v1 Vincent Gramoli

  24. Quorum-based DSM • Writing a value v2 Input: v2 Q1 Q2 Q3 Vincent Gramoli

  25. Quorum-based DSM • Writing a value v2 max tag? t1 Q1 Q2 Q3 Vincent Gramoli

  26. Quorum-based DSM • Writing a value v2 Q1 Q2 v2,t2 (with t2 > t1) Q3 Vincent Gramoli

  27. Quorum-based DSM • Works well in static system • Number of failures f must be f ≤ m - q Q1 ∩ Q2 ≠ Ø Q2 ∩ Q3 ≠ Ø Q1 Q2 Q3 • All operations can access a quorum Vincent Gramoli

  28. Quorum-based DSM • Does not work in dynamic systems • All quorums may fail if failures are unbounded Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø Q1 Q2 Q3 Vincent Gramoli

  29. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  30. Reconfiguring • Dynamism produces unbounded number of failures • Solution: Reconfiguration • Replacing the quorum configuration periodically Problem: Q1 ∩ Q2 = Ø and Q1 ∩ Q3 = Ø and Q2 ∩ Q3 = Ø Q1 Q2 Q3 Vincent Gramoli

  31. Agreeing on the Configuration • All must agree on the next configuration • Quorum-based consensus algorithm: Paxos • Before, a consensus block complemented the DSM service: • Paxos, 3-phase leader-based algorithm • Prepare a ballot (2 message delays) • Propose a configuration to install (2 message delays) • Propagate the decided configuration (1 message delay) RAMBO: Reconfigurable Atomic Memory Service for Dynamic Networks N. Lynch, A. Shvartsman, DISC 2002 Vincent Gramoli

  32. RDS: Reconfigurable Distributed Storage • RDS integrates consensus service into the reconfigurable DSM • Fast version of Paxos: • Remove the first phase (in some cases) • Quorums also propagate configuration • Ensuring Read/Write Atomicity: • Piggyback object information into Paxos messages • Parallelizing Obsolete ConfigurationRemoval: • Add an additional message to the propagate phase of Paxos Vincent Gramoli

  33. Contributions • Operations are fast (sometimes optimal) • 1 to 2 message delays • Reconfiguration is fast (fault-tolerance) • 3 to 5 message delays • While: • Operation atomicity and • Operation independence are preserved Vincent Gramoli

  34. Facing Dynamism Reconfigurable Distributed Storage G. Chockler, S. Gilbert, V. Gramoli, P. Musial, A. Shvartsman Proceedings of OPODIS 2005 Vincent Gramoli

  35. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  36. Facing Scalability is Difficult • Problems: • Large-scale participation induces load • When load is too high, requests can be lost • Bandwidth resources are limited Goal: Tolerate load by preventing communication overhead • Solution: A DSM that adapts to load variations and that restrictscommunication Vincent Gramoli

  37. Using Logical Overlay Object replicas r1, …, rk share a 2-dim coordinate space rk Vincent Gramoli

  38. Benefiting from Locality Each replica ri can communicate only with its nearest neighbors ri Vincent Gramoli

  39. Reparing the Overlay Topology takeover mechanism If a node ri fails, a takeover node rj replaces it rj ri A Scalable Content-Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker SIGCOMM2001 Vincent Gramoli

  40. Dynamic Bi-Quorums Bi-Quorums: • Quorums of two types where not all quorums intersect • Quorums of different types intersect • Vertical Quorum: All replicas responsible of an abscissa x • Horizontal Quorum: All replicas responsible of an ordinate y x For any horizontal quorum H and any vertical quorum V: H  V ≠Ø y Vincent Gramoli

  41. Operation Execution • Read Operation: • Get up-to-date value and largest tag on a horizontal quorum, • 2) Propagate this value and tag on a vertical quorum. • Write Operation: • Get up-to-date value and largest tag on a horizontal quorum, • 2) Propagate the value to write (and a higher tag) twice on the same vertical quorum Vincent Gramoli

  42. Load Adaptation Thwart: requests follow the diagonal until a non-overloaded node is found. Expansion: A node is added to the memory if no non-overloaded node is found. Shrink: if underloaded, a node leaves the memory after having notified its neighbors. Vincent Gramoli

  43. Contributions SQUARE is a DSM that: • Scales well by tolerating load variations • Defines load-optimal quorums (under reasonable assumption) • Uses communication efficient reconfiguration Vincent Gramoli

  44. Operation Latency Bad News: The operation latency increases with the load (request rate) Vincent Gramoli

  45. Facing Scalability is Difficult P2P Architecture for Self-* Atomic Memory E. Anceaume, M. Gradinariu, V. Gramoli, A. Virgillito Proceedings of ISPAN 2005 SQUARE: Scalable Quorum-Based Atomic Memory with Local Reconfiguration V. Gramoli, E. Anceaume, A. Virgillito Proceedings of ACM SAC 2007 Vincent Gramoli

  46. RoadMap • Necessary? Communicating in Large-Scale Systems • An Example of Distributed Shared Memory • Difficult? Facing Dynamism is not trivial • Difficult? Facing Scalability is tricky too • Doable? Yes, here is a solution! • Conclusion Vincent Gramoli

  47. Probability for modeling Reality Motivations for Probabilistic Solutions: • Tradeoff prevents deterministic solutions efficiency • Allowing more Realistic Models • Any node can fail independently • Even if it is unlikely that many nodes fail at the same time Vincent Gramoli

  48. What is Churn? Churn is the dynamism intensity! Dynamic System: • n interconnected nodes • Nodes join/leave the system • A joining node is new • Here, we model the churn simply as c: • At each time unit, cn nodes leave the network • At each time unit, cn nodes enter the network Vincent Gramoli

  49. Relaxing Consistency Every operation verifies all atomicity rules with high probability! Unsuccessful operation: operation that violate at east one of those rules Probabilistic Atomicity: • If an operation Op1 ends before another Op2 starts, then it is ordered after with probability ε =e-β2 (with β a constant) (If this happen, operation Op2 is considered as unsuccessful) • Write operations are totally ordered and read operations are ordered w.r.t. write operations • A read returns the last successfully value written (or the default one if none exist) with probability 1- e-β2 (with β a constant)(If this does not hold, then the read is unsuccessful) Vincent Gramoli

  50. TQS: Timed Quorum System • Intersection is provided during a bounded period of timewith high probability • Gossip-based algorithm in parallel • Shuffle set of neighbors using gossip-based algorithm • Traditional read/write operations using two message round-trip between the client and a quorum • Consult value and tag from a quorum • Create new largertag (if write) • Propagate value and tag to a quorum Vincent Gramoli

More Related