1 / 41

Logical Time

Logical Time. M. Liu. Introduction. The concept of logical time has its origin in a seminal paper by Leslie Lamport: “Time, Clocks, and the Ordering of Events in a Distributed System,” Communications of ACM, July 1978.

tao
Télécharger la présentation

Logical Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logical Time M. Liu distributed computing, M. Liu

  2. Introduction • The concept of logical time has its origin in a seminal paper by Leslie Lamport: “Time, Clocks, and the Ordering of Events in a Distributed System,” Communications of ACM, July 1978. • The topic remains of interest: a recent paper appeared in Computer Capturing Causality in Distributed System by Raynal and Singhal(see handout). distributed computing, M. Liu

  3. Application of Logical Time • Logical Time in Visualizations Produced by Parallel Computations • Banker system algorithm. • Efficient solutions to the Replicated Log and Dictionary problems by Wuu & Bernstein. distributed computing, M. Liu

  4. Background – 1 source: Raynal and Singhal • A distributed computation consists of a set of processes that cooperate and compete to achieve a common goal. These processes do not share a common global memory and communicate solely by passing messages over a communication network. distributed computing, M. Liu

  5. Background – 2source: Raynal and Singhal • In a distributed system, a process's actions are modeled as three types of events: internal, message send, and message receive. • An internal event affects only the process at which it occurs, and the events at a process are linearly ordered by their order of occurrence. • Send and receive events signify the flow of information between processes and establish causal dependency from the sender process to the receiver process. distributed computing, M. Liu

  6. Background – 3source: Raynal and Singhal • The execution of a distributed application results in a set of distributed events produced by the process. • The causal precedence relation induces a partial order on the events of a distributed computation. distributed computing, M. Liu

  7. Background – 4source: Raynal and Singhal “Causality among events, more formally the causal precedence relation, is a powerful concept for reasoning, analyzing, and drawing inferences about a distributed computation. Knowledge of the causal precedence relation between processes helps programmers, designers, and the system itself solve a variety of problems in distributed computing.” distributed computing, M. Liu

  8. Background – 5source: Raynal and Singhal “The notion of time is basic to capturing the causality between events. Distributed systems have no built-in physical time and can only approximate it. However, in a distributed computation, both the progress and the interaction between processes occur in spurts. Consequently, logical clocks can be used to accurately capture the causality relation between events. This article presents a general framework of a system of logical clocks in distributed systems and discusses three methods--scalar, vector, and matrix--for implementing logical time in these systems.” . distributed computing, M. Liu

  9. Notations • A distributed program is composed of a set of n independent and asynchronous processes p1, p2, …, pi, …, pn. These processes do not share a global clock. • Each process can execute an event spontaneously; when sending a message, it does not have to wait for the delivery to be complete. • The execution of each process pi produces a sequence of events ei0,ei1,….,eix,ei x+1, …. The set of events produced by pi have a total order determined by the sequencing of the events: eix ei x+1 We say that eixhappens beforeei x+1. The happen-before relation  is transitive: eii  eijfor all i < j. distributed computing, M. Liu

  10. Notations - 2 • Events that occur between two concurrent processes are generally unrelated, except for those that are causally related as follows: for every message m exchanged between two processes Piand Pj, we have eix = send(m),ejy=receive(m), and eix  ejy • Events in a distributed execution are partically ordered: • Local events are totally ordered. • Causal events are totally ordered. • All other events are unordered. For any two events e1 and e2 in a distributed execution, either (i) e1e2, (ii) e2e1, or (iii) e1||e2(that is, e1 and e2 are concurrent). distributed computing, M. Liu

  11. Which of these events are  related? Which ones are concurrent? distributed computing, M. Liu

  12. Clock conditions • In a system of logical clocks, every participating process has a logical clock that is advanced according to a protocol. • Every event is assigned a timestamp in such a manner that satisfy the clock consistency condition: if e1e2 then C(e1 ) < C(e2 ) where C(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if C(e1 ) < C(e2 ) then e1e2 distributed computing, M. Liu

  13. A logical clock implementation - the Lamport Clock R1: Before executing an event(send, receive, or internal), pi executes the following: Ci = Ci + d (d > 0, usually d = 1) R2: Each message carries the clock value of its sender at sending time. When pi receives a message with the timestamp Cmsg, it executes the following: • Ci = max(Ci , Cmsg ) • Execute R1. • Deliver the message. The logical clock at any process is monotonically increasing. distributed computing, M. Liu

  14. Fill in the logical clock values: distributed computing, M. Liu

  15. Correctness of the Lamport Clock Does the Lamport clock satisfy the clock consistency condition? Does the Lamport clock satisfy the strong clock consistency condition? distributed computing, M. Liu

  16. Logical Clock Protocols • The Lamport Clock is an example of a logical clock protocol. There are others. • The Lamport Clock is a scalar clock – it uses a single integer to represent the clock value. distributed computing, M. Liu

  17. Lamport clock paper PODC Influential Paper Award: 2000, http://www.podc.org/influential/2000.html “Time, clocks, and the ordering of events in a distributed system” by Leslie Lamport, obtainable from the ACM Digital Library. distributed computing, M. Liu

  18. An application of scalar logical time – bank system algorithm See bank system algorithm slides distributed computing, M. Liu

  19. Vector Logical Clock • Developed by several persons independently. • Each Pi of n participating processes maintains a integer vector (array) of size n: • vti[1,…n], where vti[i] is the local logical clock of pi, • vti[j] represents pi’s latest knowledge of Pj’s local time. distributed computing, M. Liu

  20. Vector clock protocol At process Pi: • Before executing an event, Pi updates its local logical time as follows: vti[i] = vti[i] + d (d > 0) • Each sender process piggybacks a message m with its vector clock value at sending time. Upon receiving such a message (m, vt), Pi updates its vector clock as follows: • For 1 <= k <= n: vti[k] = max(vti[k] , vt[k]) • vti[i] = vti[i] + d (d > 0) distributed computing, M. Liu

  21. Vector clock The system of vector clocks is strongly consistent • Every event is assigned a timestamp in such a manner that satisfies the clock consistency condition: if e1e2 then vt(e1 ) < vt(e2 ), using vector comparison where vt(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if vt(e1 ) < vt(e2 ) then e1e2 , using vector comparison distributed computing, M. Liu

  22. Vector comparison Given two vectors V1 and V2, both of size n: V1 < V2 if V1[i] <= V2[i] for i = 1, …, n And there exists some k, 0 < k < n+1, such that V1[k] < V2[k] • Example: V1 = {1, 2, 3, 4}; V2 = {2, 3, 4, 5} V1 < V2 • Example: V1 = {1, 2, 3, 4}; V2 = {2, 2, 4, 4} • Example: V1 = {1, 2, 3, 4}; V2 = {2, 3, 4, 1} distributed computing, M. Liu

  23. Vector clock • Because vector clocks are strongly consistent, we can use them to determine whether two events are causally related by comparing their vector time stamps, using vector comparison. distributed computing, M. Liu

  24. Matrix Time • Proposed by Michael and Fischer in 1982. • A process Pi maintains a matrix mti[1…n, 1…n] where • mti[i, i] denotes the logical clock of Pi • mti[i, j] denotes the latest knowledge that Pi has about the local clock, mtj[j, j] of Pj (row i is the vector clock of Pi . • mti[j, k] represents what Pi knows about the latest knowledge that Pj has about the local logical clock mtk[k, k] of Pk. distributed computing, M. Liu

  25. Matrix Time Protocol At process Pi: • Before executing an event, Pi updates its local logical time as follows: mti[i, i] = mti[i, i] + d (d > 0) • Each sender process piggybacks a message m with its matrix clock value at sending time. Upon receiving such a message (m, vt) from Pj, Pi updates its matrix clock as follows: • for 1 <= k <= n: mti[i, k] = max(mti[i, k] , mt[j, k] ) • for 1 <= k <= n for 1 <= q <= n mti[k, q] = max(mti[k, q] , mt[k, q] ) 3. mti[i, i] = mti[i, i] + d (d > 0) distributed computing, M. Liu

  26. matrix clock consistency The system of matrix clocks is strongly consistent • Every event is assigned a timestamp in such a manner that satisfy the clock consistency condition: if e1 e2 then mt(e1 ) < mt(e2 ), using matrix comparison where mt(ei) is the timestamp assigned to event ei • If the protocol satisfies the following condition as well, then the clock is said to be strongly consistent: if mt(e1 ) < mt(e2 ) then e1e2 , using matrix comparison distributed computing, M. Liu

  27. Matrix comparison • Given two matrixes M1 and M2, both of size n by n: M1 < M2 if M1[i, j] <= V2[i, j ] for i = 0, 1, …, n, j = 0, 1, …, n And there exist some k, 0 <k <n+1, k, and some p, 0 <p <n+1, such that M1[k, p] < V2[i, j ] • Because vector clocks are strongly consistent, we can use them to determine whether two events are causally related by comparing their vector time stamps distributed computing, M. Liu

  28. An application of matrix time: Wuu and Bernstein paper • The dictionary problem: a dictionary is replicated among multiple nodes. Each node maintains a view of the dictionary independently by performing operations on the dictionary independently. • The network may be unreliable. • The dictionary data must be consistent among the nodes. • Serializability (using locking) is the database approach to address such a problem. • The paper (as did other papers preceding it) describes an algorithm which does not require serializability. distributed computing, M. Liu

  29. Wuu and Bernstein protocol • A replicated log is used to achieve mutual consistency of replicated data in an unreliable network. • The log contains records of invocations of operations which access a data object. • Each node updates its local copy of the data object by performing the operations contained in its local copy of the log. • The operations are commutative so that the order in which operations are performed does not affect the final state of the data. distributed computing, M. Liu

  30. The problem environment • n nodes N1, N2, …, Nn are connected over a network. • Each node maintains a data dictionary V – a set of words {s1, s2, …, sn}, stored in stable storage impervious to crashes. • Vi denotes the local view of the dictionary at Ni. • Two types of operations may be issued by any node to perform on the dictionary: • insert(x) • delete(x) delete(x) can be invoked at Ni only if x is in Vi ; note that the operation may be issued by multiple nodes. insert(x) can only be issued by one node. distributed computing, M. Liu

  31. The problem environment - 2 • The unique event which inserts x is denoted ex. • An event which deletes x is called an x-delete event • If V(e) is the dictionary view at a node after the occurrence of event e, then x is in V(e) iff ex -> e and there does not exist an x-delete event, g, such the g -> e. distributed computing, M. Liu

  32. The log • Each node maintains a log of events L and a distributed algorithm is employed to keep the dictionary views up to date. • An event is recorded in the log as a record/object containing these fields: operation, time, nodeID. For example: (add a, 3, 2) if Node 2 issued “add a” at its local time 3. • The event record describing event e is denoted eR; eR.node is the node that issues the event, eR.op is the operation; eR.time is the value of time that the operation was issued. distributed computing, M. Liu

  33. The log • Nodes exchange messages containing appropriate portions of the individually maintained log in order to achieve data consistency. • L(e) denotes the contents of the the log at a node immediately after the event e completes. • The log problem: (p1) f->e iff fR is in L(e) distributed computing, M. Liu

  34. A trivial solution • Each node i that generates an event e adds a record for the event, eR, to its local log Li. • Each time the node sends a message, it includes its log Li in the message. • Upon receiving a message, a node j looks at the log enclosed in the message, and applies the event in each record to its dictionary view Vj • The logs are maintained indefinitely. If a node j is cut off from the network due to failures, its dictionary view may fall behind other nodes, but as soon as the network is repaired and messages can be sent to node j again, then the events logged by other nodes will be made known to j eventually. distributed computing, M. Liu

  35. Trivial solution • The trivial solution • is fault-tolerant. • satisfies the log problem and the dictionary problem. • The log maintained by each node i, Li, grows unboundedly, which has these ramifications: • The entire log is sent with each message – excessive communication costs • A new view of the dictionary is repeatedly computed based on the log received in each message – excessive computational costs • The entire log is stored at each node – excessive storage costs. distributed computing, M. Liu

  36. Wuu and Bernstien’s improved solutions • Uses matrix time to purge event records that have already been seen by all participants. • Each node i maintains a matrix clock Ti • When i receives a log which contains a record for event e, eR, initiated by node eR.node, it determines if process k has already seen this record by this “predicate” (boolean function): boolean hasrec(Ti , eR, k) { return (Ti[k, eR.node] > eR.time) } distributed computing, M. Liu

  37. Wuu and Bernstien’s improved solutions pp.236-7 • Kept at each node are: • Vi – the dictionary view, e.g .{a, b, c} • Pli – a partial log of events Initialization: Vi = {}; Pli = {} // set both empty, set matrix clock to all 0 distributed computing, M. Liu

  38. Wuu and Bernstien’s improved solutions pp.236-7 • When node i issues insert(x): • Update matrix clock • Add the event record to the partial log Pli • Add x to Vi • When node i issues delete(x): • Update matrix clock • Add the event record to the partial log Pli • delete x from Vi distributed computing, M. Liu

  39. Wuu and Bernstien’s improved solutions pp.236-7 • When node i sends to node k: • Create a subset of the partial log Pli,, NP, consisting of those entries such that Hasrec((Ti , eR, k) returns false. • Send the NP and Ti to node k. distributed computing, M. Liu

  40. Wuu and Bernstein’s improved solutions pp.236-7 • When node i receives from node k: • Extract from the log received a subset, NE, consisting of those entries such that Hasrec((Ti , eR, i) returns false. These entries have not already been seen by i. • Update the dictionary view Vi based on NE. • Update the matrix clock Ti • Add to the partial log Pli (note: not NE) those records in the log receivedsuch that Hasrec((Ti , eR, j) returns false for at least one j Such a record has not been seen by at least one other node. distributed computing, M. Liu

  41. Wuu and Bernstein’s improved solutions pp.236-7 • The size of the log sent with each message is minimized based on the matrix clock. • The number of log entries based on which the local dictionary view is updated is minimized, again based on the matrix clock. • The algorithm will allow each log record to be maintained by at least one node, so that eventually that knowledge will be propagated to a recovered node. distributed computing, M. Liu

More Related