130 likes | 245 Vues
Explore the distributed snapshot algorithm used for fault tolerance, testing, and debugging in operating systems. Learn about stable system properties via snapshots to handle deadlock, termination, and garbage collection.
E N D
Uncoordinated Checkpointing The Global State Recording Algorithm
channel node The Model Node properties • No shared memory • No global clock Channel properties: • FIFO • loss free • nonduplicating CS 5204 – Operating Systems
C1:transfer $50 C1:empty C1:empty $500 $450 $450 $200 $200 $250 C2:empty C2:empty C2:empty The Problem CS 5204 – Operating Systems
Distributed Snapshot (Global State Recording) • Motivation for recording a “consistent” state of the global computation: • checkpointing for fault tolerance (rollback, recovery) • testing and debugging • monitoring and auditing • Method: detecting stable properties in a distributed system via snapshots. A property is “stable” if, once it holds in a state, it holds in all subsequent states. • termination • deadlock • garbage collection CS 5204 – Operating Systems
Definitions Local State and Actions: local state: LSi message send: send(mij ) message receive: rec(mij ) time: time(x) send(mij ) LSi iff time(send(mij )) < time(LSi ) rec(mij ) LSj iff time(rec(mij )) < time(LSj ) Predicates: transit(LSi , LSj ) = {mij | send(mij ) LSi !( rec(mij ) LSj ) ) } inconsistent(LSi , LSj ) = {mij | !(send(mij ) LSi ) rec(mij ) LSj ) } Consistent Global State: i, j : 1 <= i, j <= n :: inconsistent( LSi , LSj ) = CS 5204 – Operating Systems
GlobalStateRecording Algorithm MarkerSending Rule for a Process p: for (each channel c, incident on, and directed away from p) { p sends one marker along c after p records its state and before p sends further messages along c; } MarkerReceiving Rule for a Process q: if (q has not recorded its state) then { q records its state; q records the state of c as the empty sequence; } else { q records the state of c as the sequence of message received along c after q's state was recorded and before q received the marker along c. } CS 5204 – Operating Systems
before receiving the marker, q changes its state and sends message D. empty empty M M S1 S2 S3 S0 q q q q p p p p empty empty M’ D q receives the marker and records its state (D) and the incoming channel as empty; q send marker M' on its outgoing channel. state A state A state A state B state D state D state C state C on receiving the marker, p records the channel as having message D empty recorded state q p D state A state D p records its state (A) and sends marker M on channel CS 5204 – Operating Systems
c1 500 500 p q c2 c4 c3 r = Marker M 500 Snapshot/State Recording Example CS 5204 – Operating Systems
M 10 c1 470 490 p q c2 20 c4 c3 10 r 500 Snapshot/State Recording Example (Step 1) CS 5204 – Operating Systems
c1 480 490 p q c2 M c4 20 M c3 10 25 r 475 Snapshot/State Recording Example (Step 2) CS 5204 – Operating Systems
20 c1 480 470 p q c2 M c4 20 c3 25 r M 485 Snapshot/State Recording Example (Step 3) CS 5204 – Operating Systems
c1 500 490 p q c2 c4 c3 25 r M 485 Snapshot/State Recording Example (Step 4) CS 5204 – Operating Systems
c1 500 515 p q c2 c4 c3 r 485 Snapshot/State Recording Example (Step 5) CS 5204 – Operating Systems