1 / 19

Distributed Snapshot

Distributed Snapshot. Distributed Systems. Introduction : ¿ What is a Distributed System ?. A network of processes . The nodes are processes , and the edges are comunication channels. Introduction.

yuli-hays
Télécharger la présentation

Distributed Snapshot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DistributedSnapshot DistributedSystems

  2. Introduction: ¿Whatis a DistributedSystem? • A network of processes. Thenodes are processes, and theedges are comunicationchannels.

  3. Introduction • A computationis a sequence of atomicactionsthattransform a giveninitialstatetothe final state. Whilesuchactions are totallyordered in a sequentialprocess, they are onlypartiallyordered in a distributedsystem.

  4. Introduction • In thiscontext, thestate (alsoknown as global state) of a distributedsystemisthe set of local states of allthecomponentprocesses, as well as thestates of everychannelthroughwhichmessagesflow.

  5. Introduction So theimportantquestionis: whenorhow do we record thestates of theprocesses and thechannels? Dependingonwhenthestates of the individual components are recorded, thevalue of the global state can varywidely.

  6. Difficulties • Therecording of the global statemay look simple forsomeexternalobservertwho looks at thesystemfromoutside. Thesameproblemissurprisinglychallenging, whenonetakes a snapshotfrominsidethesystem.

  7. Difficulties • Consider a system of threeprocessesnumbered 0, 1, and 2 connectedby FIFO channels, and assumethatanunknownnumber of indistinguishabletokens are circulatingindefinitelythroughthisnetwork. • Wewanttheprocessestocooperatewithoneanothertocounttheexactnumber of tokenscirculating in thesystem (withouteverstoppingthesystem).

  8. Difficulties • Deadlockdetection. Anyprocessthatdoesnothaveaneligibleactionfor a prolongedperiodwouldliketofindoutifthesystem has reached a deadlockconfiguration. • Terminationdetection. Tobeginthecomputation in a certainphase, a processmustthereforeknowwhethereveryotherprocess has finishedtheircomputation in thepreviousphase. • Network reset. In case of a malfunctionor a loss of coordination, a distributedsystemwillneedto roll back to a consistent global state and initiate a recovery. Previoussnapshotsmay be helpful.

  9. Properties of ConsistentSnapshots • A snapshot state (SSS) consists of a set of local states, where each local state is the outcome of a recording event that follows a send, or a receive, or an internal action. The important notion here is that of a consistent cut.

  10. Properties of ConsistentSnapshots • A cut is a set of events—it contains at least one event per process. • A cut is called consistent, if for each event that it contains, it also includes all events causally ordered before it.

  11. Properties of ConsistentSnapshots • The set of local states following the recorded recent events of a consistent cut forms a consistentsnapshot. • In a distributed system, many consistent snapshots can be recorded. A snapshot that is often of practical interest is the one that is most recent.

  12. TheChandy-LamportAlgorithm • Let the topology of a distributed system be represented by a strongly connected graph. Each node represents a process and each directed edge represents a FIFO channel. • A process called the initiator initiates the distributed snapshot algorithm. Any process can be an initiator. The initiator process sends a special message, called a marker (*) that prompts other processes in the system to record their states. • The global state consists of the states of the processes as well as the channels. However, channels are passive entities — so the responsibility of recording the state of a channel lies with the process on which the channel is incident.

  13. TheChandy-LamportAlgorithm • DS1 The initiator process, in one atomic action, does the following: • Turnsred • Records itsownstate • Sends a marker along all its outgoing channels • DS2 Every process, upon receiving a marker for the first time and before doing anything else, does the following in one atomic action: • Turnsred • Records itsstate • Sends markers along all its outgoing channels

  14. TheChandy-LamportAlgorithm • Thesnapshotalgorithmterminates, when: • Every process has turned red • Every process has received a marker through each of its incoming channels

  15. TheChandy-LamportAlgorithm

  16. TheChandy-LamportAlgorithm • The individual processes only record the fragments of a snapshot state SSS. It requires another phase of activity to collect these fragments and form a composite view of SSS. Global state collection is not a part of the snapshot algorithm.

  17. TheLai-Yang Algorithm • Lai andYangproposed an algorithm for distributed snapshot on a network of processes where the channels need not be FIFO. • A message is white if it is sent by a process that has not recorded its state, and a message is red if the sender has already recorded its state. • However, there are no markers — processes are allowed to record their local states spontaneously,

  18. TheLai-Yang Algorithm • LY1. The initiator records its own state. When it needs to send a message m to another process, itsends(m, red). • LY2. When a process receives a message (m, red), it records its state if it has not already done so, and then accepts the message m.

  19. The Lai-Yang Algorithm • The approach is “lazy” in as much as processes do not send or use any control message for the sake of recording a consistent snapshot. • The good thing is that if a complete snapshot is taken, then it will be consistent. • However, there is no guarantee that a complete snapshot will eventually be taken: if a process i wants to detect termination, then i will record its own state following its last action, but send no message, so other process may not record their states (dummy control messages).

More Related