Synchronization in distributed systems

A collection of independent computers that appears to its users as a single coherent system. Synchronization in distributed systems

Physical clocks • each computer has its own clock, but they are not perfect • Network Time Protocol is used to synchronize clocks • top level time servers are connected to physicalclocks, e.g. atom clocks • second level time serversconnect to several of them • algorithm to correct fornetwork delay • typically achieving 10 msaccuracy over the internet • ntp1.science.ru.nltime.windows.com

Distributed global states Suppose a customer has a bank account distributed over two branches. Suppose one wishes to calculate the total amount at exactly 3:00. In various ways this can go wrong: an amount may be in transit or the clocks are not perfectly synchronized. This can be solved by examining all messages in transit at the time of the observation. The state of a branch consists then both of the current balance and the messages that have been sent or received.

Some definitions A Channel exists between two processes if they exchange messages, for convenience viewed as one way. The State of a process includes the sequence of messages that have been sent or received, and internal conditions of the process. A Snapshot records the state of a process, including all messages sent and received since the last snapshot. A Distributed snapshot is a collection of snapshots, one for each process. A Global State is the combined state of all processes. A true global state cannot be determined because of the time lapse associated with message transfers and the difficulty in synchronizing the clocks. One can attempt to define a global state by collecting snapshots from all processes.

Distributed snapshot algorithm • Chandy and Lamport (1985) gave an algorithm to record a consistent global state. • They assume that messages are delivered in the order they are send and that no messages are lost. • The method uses a special control message, called a marker. • Some process initiates the algorithm by recording its state and sending a marker on all outgoing channels. • After the algorithm terminates the snapshots present at each process record a consistent global state. • It can be used to adapt any centralized algorithm to a distributed environment, because the basis of any centralized algorithm is knowledge of the global state.

the algorithm • Marker Sending Rule for process i • Process i records its state. • For each outgoing channel C on which a marker has not been sent, i sends a marker along C before i sends further messages along C. • Marker Receiving Rule along channel C for process j • if j has not recorded its state then • Record the state of C as the empty set • Follow the ‘Marker Sending Rule’ • else • Record the state of C as the set of messages received along C after j’ s state was recorded and before j received the marker along C

Distributed Mutual exclusion The model used for examining approaches to mutual exclusion in a distributed context. A number of systems or nodes is assumed interconnected by some network. In each node one process is responsible for resource allocation. It controls a number of resources and services a number of local processes. Algorithms for mutual exclusion may be centralized or distributed.

Centralized • In a centralized algorithm one node is designated as the control node, it controls the access to all resources shared over the network. • When any process wants access to a critical resource, it issues a request to its local resource controlling process. This process sends this request to the resource controlling process on the control node, which returns a permission message when the shared resource becomes available. When a process has finished with a resource, a release message is sent to the control node. • Such a centralized algorithm has two key properties: • Only (a process in) the control node makes resource allocation decisions. • All necessary information is concentrated in the control node, including the identity and location of all resources and the allocation status of each resource. • It is easy to see how mutual exclusion is enforced. • There are however drawbacks. If the central node fails, then the mutual exclusion mechanism breaks down, at least temporarily. Furthermore, every resource allocation and de-allocation requires an exchange of messages with the control node and execution time on it. Thus the control node may become a bottleneck.

Distributed • A fully distributed algorithm is characterized by the following properties: • All nodes have equal amount of information, on average. • Each node has only a partial picture of the total system and must make decisions based on it. • All nodes bear equal responsibility for the final decision. • All nodes expend equal effort, in average, in effecting a final decision. • Failure of a node, in general, does not result in a total system collapse. • There exists no system wide common clock with which to regulate the timing of events. • point 2: some distributed algorithms require that all information known to any node be communicated to all other nodes. Even in this case, some of that information is in transit and will not have arrived at all of the other nodes. Thus a node's information is usually not completely up to date, thus partial. • point 6: because of the communication delay, it is impossible to maintain a system wide clock that is instantly available to all systems. It is also technically impractical to maintain one central clock and to keep all local clocks synchronized.

Ordering of events We would like to be able to say that event a at system i occurred before (or after) event b at system j, and to arrive consistently at this conclusion at all nodes. Lamport (1978) has proposed a method, called timestamping which orders events in a distributed system without using physical clocks. This technique is so efficient and effective that it is used in many algorithms for mutual exclusion and deadlock prevention. Ultimately, we are concerned with actions that occur at a local system, such as a process entering or leaving its critical session. However, in a distributed system, processes interact by means of messages. Therefore, it makes sense to associate events with messages, remark that a local event can be bound to a message very simply. To avoid ambiguity, we associate events with the sending of messages only, not with the receipt of messages

Timestamping Each node i in the network maintains a local counter Ci, which functions as a clock. Each time a system transmits a message, it first increments its clock by 1. The message is sent in the form: (contents, Ti=Ci, i) When a message is received, the receiving node j sets its clock as: Cj := 1 + max( Cj, Ti) At each node the ordering is then: message x from i precedes message y from j if one of the following conditions holds: Ti < Tj or Ti = Tj and i < j Each message is sent from one process to all other processes. If some are not sent this way, it is impossible that all sites have the same ordering of messages. Only a collection of partial orderings exists.

examples In the above example P1 begins with its clock at 0. It increments its clock by 1 and transmits (a,1,1), the first 1 is the timestamp and the second the identity of P1. P2 and P3 on receive of the message set their clocks to 1 + max( 1,0) = 2. P2 increments its clock by 1 and transmits (x, 3, 2). Etc. At the end, each process agrees to the order {a, x, b ,j}. Note that b might have been send after j in physical time. This example shows that the algorithm works in spite of differences in transmission time between pair of systems. The message from P1 arrives earlier than that of P4 at site 2 but later at site 3. Nevertheless, after all messages have been received, the ordering is the same at all sites, namely {a, q}.

Distributed queue • One of the earliest proposed approaches to providing distributed mutual exclusion is based on the concept of a distributed queue (Lamport 1978). It uses the previously described model, and assumes a fully connected network, every process can send a message directly to every other process. For simplicity, we describe the case in which each site only controls a single resource. • At each site, a data structure is maintained that keeps a record of the most recent message received from each site, and the most recent message sent at this site. Lamport refers to this structure as a queue, actually it is an array with one entry for each site. At any instant, entry qi[j] in the local array contains a message from Pj. • The array is initialized as: qi[j] = (Release, 0, j) for j = 1, ...., N • Three types of messages are used in this algorithm: • (Request, Ti, i): A request for access to the resource is made by Pi • (Reply, Tj, j): Pj replies to a request message • (Release, Tk, k): Pk releases a resource previously allocated to it

…algorithm • When Pi wants access to a resource, it sends (Request, Ti, i) to all other processes and puts it at its own qi[i]. • When Pj receives (Request, Ti, i) it puts it in qj[i] and sends (Reply, Tj, j) to Pi. • Pi can access a resource (enter its critical session) when both of the following conditions hold: • qi[i] is the earliest Request message in qi. • all other messages in qi are later than qi[i] • Pi releases a resource by sending (Release, Ti, i) to all processes and putting it at its own qi[i]. • When Pj receives (Release, Ti, i) or (Reply, Ti, i) it puts it in qj[i]. • This algorithm enforce mutual exclusion, is fair, avoids deadlock and starvation. • Note that 3(N-1) messages (error free) are required for each mutual exclusion access. If broadcast is also used this reduces to N+1 messages. • Ricard and Agrawala (1981) optimized the Lamport method by eliminating release messages ( 2(N-1) messages, with broadcast N ). • Token passing algorithms reduce the number of messages further.

Synchronization in distributed systems