Global Time in Distributed Systems at Govindrao Wanjari College

Govindrao Wanjari College of Engineering & Technology,NagpurDepartment of CSE“LAMPORT CLOCK” Branch/ Sem: CSE/8th sem Session: 2017-18Subject :DOS SUBJECT TEACHER:PROF.P.Y.JANE

Today’s outline • Global Time • Time in distributed systems • A baseball example • Synchronizing real clocks • Cristian’s algorithm • The Berkeley Algorithm • Network Time Protocol (NTP) • Logical time • Lamport logical clocks • Vector Clocks

Why Global Timing? • Suppose there were a globally consistent time standard • Would be handy • Who got last seat on airplane? • Who submitted final auction bid before deadline? • Did defense move before snap?

Time Standards • UT1 • Based on astronomical observations • “Greenwich Mean Time” • TAI • Started Jan 1, 1958 • Each second is 9,192,631,770 cycles of radiation emitted by Cesium atom • Has diverged from UT1 due to slowing of earth’s rotation • UTC • TAI + leap seconds to be within 800ms of UT1 • Currently 34

Comparing Time Standards UT1 − UTC

Distributed time • Premise • The notion of time is well-defined (and measurable) at each single location • But the relationship between time at different locations is unclear • Can minimize discrepancies, but never eliminate them • Reality • Stationary GPS receivers can get global time with < 1µs error • Few systems designed to use this

A baseball example • Four locations: pitcher’s mound, first base, home plate, and third base • Ten events: e1: pitcher throws ball to home e2: ball arrives at home e3: batter hits ball to pitcher e4: batter runs to first base e5: runner runs to home e6: ball arrives at pitcher e7: pitcher throws ball to first base e8: runner arrives at home e9: ball arrives at first base e10: batter arrives at first base

A baseball example • Pitcher knows e1 happens before e6, which happens before e7 • Home plate umpire knows e2 is before e3, which is before e4, which is before e8, … • Relationship between e8 and e9 is unclear

Ways to synchronize • Send message from first base to home? • Or to a central timekeeper • How long does this message take to arrive? • Synchronize clocks before the game? • Clocks drift • million to one => 1 second in 11 days • Synchronize continuously during the game? • GPS, pulsars, etc

Perfect networks • Messages always arrive, with propagation delay exactly d • Sender sends time T in a message • Receiver sets clock to T+d • Synchronization is exact

Synchronous networks • Messages always arrive, with propagation delay at mostD • Sender sends time T in a message • Receiver sets clock to T + D/2 • Synchronization error is at most D/2

Synchronization in the real world • Real networks are asynchronous • Propagation delays are arbitrary • Real networks are unreliable • Messages don’t always arrive

Cristian’s algorithm • Request time, get reply • Measure actual round-trip time d • Sender’s time was T between t1 and t2 • Receiver sets time to T + d/2 • Synchronization error is at most d/2 • Can retry until we get a relatively small d

The Berkeley algorithm • Master uses Cristian’s algorithm to get time from many clients • Computes average time • Can discard outliers • Sends time adjustments back to all clients

The Network Time Protocol (NTP) • Uses a hierarchy of time servers • Class 1 servers have highly-accurate clocks • connected directly to atomic clocks, etc. • Class 2 servers get time from only Class 1 and Class 2 servers • Class 3 servers get time from any server • Synchronization similar to Cristian’s alg. • Modified to use multiple one-way messages instead of immediate round-trip • Accuracy: Local ~1ms, Global ~10ms

Real synchronization is imperfect • Clocks never exactly synchronized • Often inadequate for distributed systems • might need totally-ordered events • might need millionth-of-a-second precision

Logical time • Capture just the “happens before” relationship between events • Discard the infinitesimal granularity of time • Corresponds roughly to causality • Time at each process is well-defined • Definition (→i): We say e →i e’ if e happens before e’ at process i

Global logical time • Definition (→): We define e → e’ using the following rules: • Local ordering: e→ e’ if e→ie’ for any process i • Messages: send(m) → receive(m) for any message m • Transitivity: e → e’’ if e→ e’ and e’→ e’’ • We say e “happens before” e’ if e →e’

Concurrency • → is only a partial-order • Some events are unrelated • Definition (concurrency): We say e is concurrent with e’ (written e║e’) if neither e→e’ nor e’→e

The baseball example revisited • e1→ e2 • by the message rule • e1 → e10, because • e1 → e2, by the message rule • e2 → e4, by local ordering at home plate • e4 → e10, by the message rule • Repeated transitivity of the above relations • e8║e9, because • No application of the → rules yields either e8 → e9 or e9 → e8

Lamport logical clocks • Lamport clock L orders events consistent with logical “happens before” ordering • If e → e’, then L(e) < L(e’) • But not the converse • L(e) < L(e’) does not imply e → e’ • Similar rules for concurrency • L(e) = L(e’) implies e║e’ (for distinct e,e’) • e║e’ does not imply L(e) = L(e’) • i.e., Lamport clocks arbitrarily order some concurrent events

Lamport’s algorithm • Each process i keeps a local clock, Li • Three rules: • At process i, increment Li before each event • To send a message m at process i, apply rule 1 and then include the current local time in the message: i.e., send(m,Li) • To receive a message (m,t) at process j, set Lj = max(Lj,t) and then apply rule 1 before time-stamping the receive event • The global time L(e) of an event e is just its local time • For an event e at process i, L(e) = Li(e)

Lamport on the baseball example • Initializing each local clock to 0, we get L(e1) = 1 (pitcher throws ball to home) L(e2) = 2 (ball arrives at home) L(e3) = 3 (batter hits ball to pitcher) L(e4) = 4 (batter runs to first base) L(e5) = 1 (runner runs to home) L(e6) = 4 (ball arrives at pitcher) L(e7) = 5 (pitcher throws ball to first base) L(e8) = 5 (runner arrives at home) L(e9) = 6 (ball arrives at first base) L(e10) = 7 (batter arrives at first base) • For our example, Lamport’s algorithm says that the run scores!

Total-order Lamport clocks • Many systems require a total-ordering of events, not a partial-ordering • Use Lamport’s algorithm, but break ties using the process ID • L(e) = M * Li(e) + i • M = maximum number of processes

Vector Clocks • Goal • Want ordering that matches causality • V(e) < V(e’) if and only if e → e’ • Method • Label each event by vector V(e) [c1, c2 …, cn] • ci = # events in process i that causally precede e

Vector Clock Algorithm • Initially, all vectors [0,0,…,0] • For event on process i, increment own ci • Label message sent with local vector • When process j receives message with vector [d1, d2, …, dn]: • Set local each local entry k to max(ck, dk) • Increment value of cj

Vector clocks on the baseball example • Vector: [p,f,h,t]

Important Points • Physical Clocks • Can keep closely synchronized, but never perfect • Logical Clocks • Encode causality relationship • Lamport clocks provide only one-way encoding • Vector clocks provide exact causality information

Govindrao Wanjari College of Engineering & Technology,NagpurDepartment of CSE“DEADLOCK DETECYION” Branch/ Sem: CSE/8th sem Session: 2017-18Subject :DOS SUBJECT TEACHER:PROF.P.Y.JANE

Deadlocks • Resource Deadlocks • A process needs multiple resources for an activity. • Deadlock occurs if each process in a set request resources • held by another process in the same set, and it must receive • all the requested resources to move further. • Communication Deadlocks • Processes wait to communicate with other processes in a set. • Each process in the set is waiting on another process’s • message, and no process in the set initiates a message • until it receives a message for which it is waiting.

Graph Models • Nodes of a graph are processes. Edges of a graph the pending requests or assignment of resources. • Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2. • Transaction-wait-for Graphs (TWF): WFG in databases. • Deadlock: directed cycle in the graph. • Cycle example: P1 P2

Graph Models • Wait-for Graphs (WFG): P1 -> P2 implies P1 is waiting for a resource from P2. R1 P1 R2 P2

AND, OR Models • AND Model • A process/transaction can simultaneously request for multiple resources. • Remains blocked until it is granted all of the requested resources. • OR Model • A process/transaction can simultaneously request for multiple resources. • Remains blocked till any one of the requested resource is granted.

Sufficient Condition Deadlock ?? P1 P2 P5 P4 P3 P6

AND, OR Models • AND Model • Presence of a cycle. P1 P2 P1 P1 P1 P1

AND, OR Models • OR Model • Presence of a knot. • Knot: Subset of a graph such that starting from any node in the subset, it is impossible to leave the knot by following the edges of the graph. P1 P2 P5 P4 P3 P6

Deadlock Handling Strategies • Deadlock Prevention: difficult • Deadlock Avoidance: before allocation, check for possible deadlocks. • Difficult as it needs global state info in each site (that handles resources). • Deadlock Detection: Find cycles. Focus of discussion. • Deadlock detection algorithms must satisfy 2 conditions: • No undetected deadlocks. • No false deadlocks.

Distributed Deadlocks • Centralized Control • A control site constructs wait-for graphs (WFGs) and checks for directed cycles. • WFG can be maintained continuously (or) built on-demand by requesting WFGs from individual sites. • Distributed Control • WFG is spread over different sites.Any site can initiate the deadlock detection process. • Hierarchical Control • Sites are arranged in a hierarchy. • A site checks for cycles only in descendents.

Centralized Algorithms • Ho-Ramamoorthy 2-phase Algorithm • Each site maintains a status table of all processes initiated at that site: includes all resources locked & all resources being waited on. • Controller requests (periodically) the status table from each site. • Controller then constructs WFG from these tables, searches for cycle(s). • If no cycles, no deadlocks. • Otherwise, (cycle exists): Request for state tables again. • Construct WFG based only on common transactions in the 2 tables. • If the same cycle is detected again, system is in deadlock. • Later proved: cycles in 2 consecutive reports need not result in a deadlock. Hence, this algorithm detects false deadlocks.

Centralized Algorithms... • Ho-Ramamoorthy 1-phase Algorithm • Each site maintains 2 status tables: resource status table and process status table. • Resource table: transactions that have locked or are waiting for resources. • Process table: resources locked by or waited on by transactions. • Controller periodically collects these tables from each site. • Constructs a WFG from transactions common to both the tables. • No cycle, no deadlocks. • A cycle means a deadlock.

Distributed Algorithms • Path-pushing: resource dependency information disseminated through designated paths (in the graph). • Edge-chasing: special messages or probes circulated along edges of WFG. Deadlock exists if the probe is received back by the initiator. • Diffusion computation: queries on status sent to process in WFG. • Global state detection: get a snapshot of the distributed system. Not discussed further in class.

Edge-Chasing Algorithm • Chandy-Misra-Haas’s Algorithm: • A probe(i, j, k) is used by a deadlock detection process Pi. This probe is sent by the home site of Pj to Pk. • This probe message is circulated via the edges of the graph. Probe returning to Pi implies deadlock detection. • Terms used: • Pj is dependent on Pk, if a sequence of Pj, Pi1,.., Pim, Pk exists. • Pj is locally dependent on Pk, if above condition + Pj,Pk on same site. • Each process maintains an array dependenti: dependenti(j) is true if Pi knows that Pj is dependent on it. (initially set to false for all i & j).

Chandy-Misra-Haas’s Algorithm Sending the probe: if Pi is locally dependent on itself then deadlock. else for all Pj and Pk such that (a) Pi is locally dependent upon Pj, and (b) Pj is waiting on Pk, and (c ) Pj and Pk are on different sites, send probe(i,j,k) to the home site of Pk. Receiving the probe: if (d) Pk is blocked, and (e) dependentk(i) is false, and (f) Pk has not replied to all requests of Pj, then begin dependentk(i) := true; if k = i then Pi is deadlocked else ...

Chandy-Misra-Haas’s Algorithm Receiving the probe: ……. else for all Pm and Pn such that (a’) Pk is locally dependent upon Pm, and (b’) Pm is waiting on Pn, and (c’) Pm and Pn are on different sites, send probe(i,m,n) to the home site of Pn. end. Performance: For a deadlock that spans m processes over n sites, m(n-1)/2 messages are needed. Size of the message 3 words. Delay in deadlock detection O(n).

C-M-H Algorithm: Example P0 P2 P1 P3 probe(1,3,4) probe(1,7,1) P4 P7 P5 P6

Diffusion-based Algorithm Initiation by a blocked process Pi: send query(i,i,j) to all processes Pj in the dependent set DSi of Pi; num(i) := |DSi|; waiti(i) := true; Blocked process Pk receiving query(i,j,k): if this is engaging query for process Pk /* first query from Pi */ then send query(i,k,m) to all Pm in DSk; numk(i) := |DSk|; waitk(i) := true; else if waitk(i) then send a reply(i,k,j) to Pj. Process Pk receiving reply(i,j,k) if waitk(i) then numk(i) := numk(i) - 1; if numk(i) = 0 then if i = k then declare a deadlock. else send reply(i, k, m) to Pm, which sent the engaging query.

Diffusion Algorithm: Example reply(1,6,2) query P2 reply P1 P3 reply(1,1,7) query(1,3,4) query(1,7,1) P4 P7 P5 P6

Engaging Query • How to distinguish an engaging query? • query(i,j,k) from the initiator contains a unique sequence number for the query apart from the tuple (i,j,k). • This sequence number is used to identify subsequent queries. • (e.g.,) when query(1,7,1) is received by P1 from P7, P1 checks the sequence number along with the tuple. • P1 understands that the query was initiated by itself and it is not an engaging query. • Hence, P1 sends a reply back to P7 instead of forwarding the query on all its outgoing links.

AND, OR Models • AND Model • A process/transaction can simultaneously request for multiple resources. • Remains blocked until it is granted all of the requested resources. • Edge-chasing algorithm can be applied here. • OR Model • A process/transaction can simultaneously request for multiple resources. • Remains blocked till any one of the requested resource is granted. • Diffusion based algorithm can be applied here.

Hierarchical Deadlock Detection • Follows Ho-Ramamoorthy’s 1-phase algorithm. More than 1 control site • organized in hierarchical manner. • Each control site applies 1-phase algorithm to detect (intracluster) deadlocks. • Central site collects info from control sites, applies 1-phase algorithm to • detect intracluster deadlocks. Control site Central Site Control site Control site

Global Time in Distributed Systems at Govindrao Wanjari College

Global Time in Distributed Systems at Govindrao Wanjari College

Presentation Transcript

Chapter 1 The Semantic Web Vision

WORKING TOGETHER TO ACCOMPLISH OUR GOALS AND BETTER SERVE OUR CUSTOMERS

TODAY! TODAY! TODAY!

Today’s outline

Outline: 2/14/07

Today – Week 8

EE359 – Lecture 3 Outline

Outline for today’s class

Adapting and Accommodating for all Learners

“Most Dangerous Game” Outline

Outline for our time today

Outline for today

Outline For Today’s Discussion

What are we going to do today?

Outline for Today’s Lecture

Outline

Today’s Outline

Internet Transport Protocols Today and Tomorrow

Outline for Today

Lecture 12: Noam Chomsky

Outline for Today

Toward a Greener Tomorrow... Today!