430 likes | 707 Vues
Distributed Systems. Distributed System models Physical Networks Logical Models Different Failure Models Communication constructs ( semantics of distributed programs) Ordering of events and Execution Semantics. System Model. Two ways of viewing a DS :
E N D
Distributed Systems Distributed System models Physical Networks Logical Models Different Failure Models Communication constructs ( semantics of distributed programs) Ordering of events and Execution Semantics
System Model Two ways of viewing a DS : • As defined by the physical components of the system – physical model • As defined from the point of view of processing or computation – logical model
The goal of fault tolerance in DS Is to ensure that some property/service, in the logical model is preserved despite the failure of some component(s) in the physical system.
Physical Network of a DS Consists of many computers called nodes that are typically : • Autonomous • Geographically Separated • Communicate through Communication Networks
Nodes loosely coupled Essentially no shared memory Private clocks for nodes Nodes closely coupled May have shared memory b/w nodes May have a single global clock for many/all nodes Distributed vs. Parallel Systems
Point to Point Physical Network Fully Connected Star Tree Communication Protocols used: TCP/IP, OSI etc.
Bus Topology Nodes Common bus Nodes Communication Protocol used : CSMA/CD
Logical Model • A Distributed Application consists of: a set of concurrently executing processesthat cooperate with each other to perform some task. • A process is the execution of a sequential program, which is a list of instructions.
Concurrent Processes Can be classified in three categories: • Independent processes : the sets of objects accesses are disjoint. • Competing Processes: share resources but there is no information exchange between them. • Cooperating Processes:exchange information either by using shared data or by message passing.
A few logical level assumptions • Finite progress assumption :since no assumptions about the relative speeds of processes can be made, it is assumed that they all have positive rates of execution. • Underlying network is treated as fully connected/topology is not considered. • At logical level the system is made of processes and channels between them. • Channels are assumed to have infinite buffer and to be error-free. • Channels deliver messages in the order in which they are sent (NB: on a particular channel).
Assumptions about Time Bounds on the performance of the system are also made. • A system is said to be synchronous if, whenever the system is working correctly, it performs it’s intended function within a finite and known time bound, otherwise it is said to be asynchronous. • A synchronous channel is one in which the max. messagedelay is known and bounded. • A synchronous processor is one in which the time to execute a sequence of instructions is finite and bounded. • **Advantage of synchronous systems : failure of components can be deduced by the lack of response.
Failures and Fault Classification • Crash Fault :causes a component to halt or to lose its internal state, component never undergoes any incorrect state transition when it fails. • Omission Fault: causes a component to not respond to some inputs. • Timing/performance Fault:causes a component to respond either too early or too late. • Byzantine Fault:causes the component to behave in totally arbitrary manner during failure. • Incorrect Computation Fault : produces incorrect outputs.
Fault Hierarchy byzantine timing omission crash **Incorrect computation fault is a subset of byzantine but different from the other faults
Assumptions about fault types • For a processor : crash fault or byzantine fault • For a communication network :all the different types of faults • For a clock : timing fault, byzantine fault and sometimes omission fault • For a storage media: crash, timing, omission incorrect computation faults. • For software components: most of the above defined faults but most important is incorrect computation fault.
Interprocess Communication • Synchronization and communication are both achieved by message passing primitives. • In shared memory systems, primitives like semaphores, conditional critical regions and monitors are used.
Process Creation • Processes are created in a system by the use of some operating system-provided system call. • At the language level, this is done by using some language primitives eg. fork and join and cobegin-coend statement.
Program P1 ……….. fork P2; ………. join P2; ………. Program P2 …………. …………. …………. end Fork and Join Primitive
Cobegin-Coend Primitive cobeginS1|| S2|| S3||…..||Sncoend The above statement causes n different processes to be created, each executing a different statement Si, it ends with the termination of all the Si’s.
Asynchronous Message Passing • In DS without Shared Memory message passing is used both for communication and synchronization. • A message is sent by a process by executing a send command: send (data, destination) • Receiving of data is done with a receive command: receive(data, source) or receive (message) * *in client–server interaction.
Assumptions • Message passing requires some buffer between sender and receiver: • In asynchronous message passing infinite buffer to store message is assumed, so sender can go on sending messages however receiver is not non-blocking. • In reality though buffers are finite size, so sender may also have to block: called buffered message passing.
NB: • Asynchronous and synchronous message passing is different from asynchronous and synchronous DS. Theformer refers to communication primitives and the size of the buffer between sender and receiver, while the latter deals with bounds on message delays. • In synchronous DS , both asynchronous and synchronous message passing can be supported.
Synchronous msg passing & CSP • Has no buffering. • Has the advantage that at each communication command it is easier to make assertions about processes. • Has been employed in Communicating Sequential Processes (CSP), a notation proposed for specifying distributed programs. • CSP uses Guarded Command Language.
Guarded Commands A GC is a statement list that is prefixed by a Boolean expression called a guard : guard statement list. The statement list is eligible for execution only if the guard evaluates to true i.e. it succeeds. Evaluation of guard is assumed to have no side-effects i.e. it does not alter the state of the program in any manner. The alternative construct is formed by using a set of guarded commands as follows:
[ G1 S1 G2 S2 ………… ………… Gn Sn ], The execution of this alternative construct aborts if all the guarded commands evaluate to false. If any GC is true, the corresponding statement is eligible for execution. In case multiple GC’s evaluate to true, the statement to be executed is selected non-deterministically. Repetitive structure is similar but with a *prefix. GC notation allows non-determinism within a program.
Communicating Sequential Processes • Is a programming notation for expressing concurrent programs. • Employs synchronous message passing. • Uses guarded commands to allow selective communication. • A CSP program may consist of many concurrent processes: e.g. A process Pi sends a message, msg, to a process Pj by an output command of the form: Pj!msg. • Pj receives a message from Pi by input command: Pi?m. • For a process Pj the overall code is of the form: • Pj :: Initialize; *[G1 C1[] G2 C2 …..[]Gn Cn].
Remote Procedure Call • A higher level primitive to support client–server interaction. • An extension of the procedure call mechanism available in most programming languages. • The service to be provided by the server is treated as a procedure that resides on the machine on which the server is.The client process that needs that service makes “calls” to this procedure and RPC takes care of the underlying communication. • A call statement is of the form: • call service (value_args, result_args )
The states of the server and the client both may change as a result of a ‘call ‘. • However Idempotent remote procedures on the server do not change the state of the server after each ‘call’ from the client. • Idempotent servers simplify the task of fault tolerance. • Two basic approaches to specifying the server side in RPC: • Remote procedure is just like a sequential procedure i.e. single process executes the procedure as calls are made. • A new process is created every time a call is made. These processes can be concurrent.
Semantics of the RPC in failure conditions The classification for semantics of remote calls: • At Least Once: remote proc. has been executed one or more times if the invocation terminates normally. If it terminates abnormally nothing can be said about the number of times remote proc. executed. • Exactly Once: remote proc. has executed exactly once if invocation terminates normally if not , then it can be asserted that remote proc. Did not execute more than once. • At Most once: same as exactly once if invocation terminates normally, otherwise it is guaranteed that remote proc. Has been executed completely once or has not been executed at all.
Orphans: Unwanted executions of remote procedures caused due to communication or processor failure. e.g. A client that crashes after issuing a call may restart on recovery and reissue the call even though the previous call is still being executed by the server. Presence of orphans can violate the semantics of RPC and lead to inconsistency. Call Ordering: property requires that a sequence of invocations generated by a given client result in computations performed by the server in the same order. It is automatically satisfied if there are no failures.Not a strict requirement in case of Idempotent servers.
Object-Action Model • Another high-level communication paradigm. • In this paradigm: a system consists of many objects that consist some data and well defined methods (operations) on that data. • The encapsulated data can only be accesses through the methods defined for them. • The objects may reside on different nodes. • A process, sends a message to the object concerned, which performs an action by executing a method and returns the result to the process.
Nested remote procedure calls may be created. • Methods on objects may execute in parallel. • Concurrent calls may be made to the same method or to the same object. • Becoming popular since it supports : • Fault tolerance by possible replication of objects.
Ordering of Events No single global clock for defining happened–before relationship between events of different processors. Partial Ordering : the relation on a set of events in a distributed system is the smallest relation satisfying the following three conditions: • If a and b are events performed by the same process and a is performed before b, then a b. • If a is the sending of a message by one process and b is the receiving of the same message by another process. • If ab and bc, then ac . Two events are said to be concurrent if neither ab , nor b a.
Logical Clocks • The logical clock Ci, for a process Pi, is a function which assigns a value Ci(a) to an event a of the process Pi. • The system of logical clocks is considered to be correct if it is consistent with the relation or for any events a, b if ab then C(a) < C(b) • Whe a msg is sent from process Pi, the timestamp of the sending event is included in the msg m and can be retrieved by the receiver. • Let Tm be the timestamp of the message m. There are two conditions that a system of logical clocks should satisfy in order to be correct: • Each Pi increments Ci between any two successive events. • Upon receiving a msg m, Pj sets Cj greater than or equal to it’s present value and greater than Tm.
Total Ordering of Events • Order the events by the timestamps assigned to them by the logical clock system.Processes can be ordered in their lexicographic order of names. • A relationship => on the set of events has been defined as follows: for events a and b of processes Pi and Pj respectively a=> b iff either Ci(a) < Cj(b) or Ci(a) = Cj(b) and Pi comes before Pj in the ordering.
Execution Model and System State • At a logical level, a distributed system can be modeled as a directed graph with nodes representing channels between processes. • The state of a channel in this model is the sequence of msgs that are still in the channel. • A process can be considered as consisting of a set of states, an initial state, and a sequence of events (or actions). • The state of a process is an assignment of a value to each of its variables, along with the specification of it’s control point which specifies the event executed last. • Each event or action of a process assumed to be atomic. • An event e of a process p can change the state ofp and at most one channel c that is incident on p.
Each event has an enabling condition, which is a condition on the state of the process and the channel attached to it. • An event e can occur only if this enabling condition is true.e.g. when the program counter has a specific value. • The global state or the system state of a DS consists of states of each of the processes in the system and the states of the channels in the system. • The initial global state is one in which each process is in its initial state and all channels are empty. • An event e can change the system state S by changing the state of process p, iff the enabling condition for e is true in S.
A function ready(S) is defined on a global state S as a set of events for which enabling condition is satisfied in S. • The events in ready (S) can belong to different processes , however only one of these events will take place. • Which of the events in ready (S) will occur can not be predicted deterministically. • We define another function next, where next(S, e) is the global state immediately following the occurrence of the event e in the global state S. • The computation of a DS can be defined as a sequence of events. • Let the initial state of the system be S0 and let seq = (ei, 0<= I <= n) be a sequence of events. • Suppose that the system state when ei occurs is Si, the sequence of events seq is a computation of the system if the following conditions are satisfied:
The event ei belongs toready(Si), 0<= i<= n. • Si+1 = next(Si, ei), 0<=i<=n. • Example: A concurrent shared memory program. • a: x := 0 • b: cobegin • c: y := 0 • d: cobegin • e: y := 2*y • || f : y := y +3 • coend • || g: while y = 0 do • h: x:= x+1 • coend • j: x:= 2*y
An execution sequence for the program: S0 :[(2,7); {a}] (a) S1 :[(0,7);{c,g}] (c ) S2 :[(0,0); {e,f,g}] (g) S3 :[(0,0);{e,f,h}] (h) S4 :[(1,0); {e,f,g}] (f) S5 :[(1,3);{e,g}] (e) S6 :[(1,6); {g}] (g) S7 :[(1,6);{j}] (j) S8 :[(12,6); {}] The possible states of the system can alos be represented as a tree with its root as the initial state and each event in the ready state producing a child of a node. Such a tree is called a reachability tree, in which each node represents a state, and the number of children of a node equals the cardinality of the ready set at that state. Each path from the initial node to a leaf node shows one possible execution sequence of the system. The states in the path are called valid or consistent states.
Reachability Tree S0 a S1 g f b S2 e f g S3 e h S4 This model is also called the interleaving model.
For details, please refer to: ‘Fault Tolerance in Distributed Systems’ , -by Pankaj Jalote ~THANK YOU~