Message Logging Pessimistic & Optimistic

Message LoggingPessimistic & Optimistic CS717 Lecture 10/16/01-10/18/01 Kamen Yotov kyotov@cs.cornell.edu

Intruduction • Context & Applications • Check-pointing • Message Logging • Pessimistic (failure-free mode suffers) • Optimistic (good for failure-free mode) • Causal (to be discussed in next lectures...) • Main problems • Consistency • Orphans

Fault Tolerance “Why”s • Flow of events • Check-point • Log messages • Crash • Restore • Replay

Common Assumptions • Fail-stop model • Failure eventually detectable by all • Channels • Asynchronous • Reliable • FIFO • Unbounded message delivery • Failures • Transiently dropping • No duplication and/or corruption • Stable storage • Spare processing capacity

Common goals • Application independence • Application transparency • Simple • Independent evolution • Handles preexisting programs • High throughput • Failure-free model with little overhead • Maximum fault-tolerance • Any number of failures

Formal Terminology • Delivery (as opposed to receipt) • Non-faulty processes eventually deliver all messages that they have received • Receive sequence number • If p delivers m and m.rsn=l then m is the lth message p delivers • Run • Sequence of system states • Asynchronous • Only one process changes state at once

Formal Terminology (cont.) • Properties: Logical expressions over runs • □ - Always  • ◊ - Eventually  • Message determinant • #m = <m.src, m.ssn, m.dest, m.rsn, m.data> • m.data and m.dest not essential • Logging determinants vs. actual messages • Other notation • N – set of all processes • C – set of failed processes • Log(m) – set of processes possessing a copy of #m • Depend(m) – set of processes that depend on m

Orphan Properties • Before failure, by definition #mLog(m) • #m lost if Log(m)C • stable(m) if #m cannot be lost • p orphan of C if • p did not fail • pDepend(m) • #m is lost

Orphan Properties (cont.)

Performance Metrics • Number of forced roll-backs • Time spend on blocking • Number of messages • Size of messages

Got to the real-world stuff! • No additional messages • Any number of failures (including total) • No assumptions about the logging protocol • Pessimistic doesn’t require that generality

The ModelProcess states • Process states • State interval • Instantiates a new one on each message received • State interval index (auto increment) I01 I11 I32 p1 p2 p3 I03 I13 I23 I33 I43 I53

The ModelProcess states (cont.) • Dependencies between process states (pi depends on pj) • Maximum index of any interval of pj, on which pi depends • Inside a process each interval depends on the previous one • Dependence vector • di = <*> = < 1, 2, 3, 4,…, n>, k = , 0, 1, … I01 I11 I32 p1 p2 p3 I03 I13 I23 I33 I43 I53

The ModelSystem states • Process state – dependence vector • di = <*> = < 1, 2, 3, 4,…, n>, k = , 0, 1, … • System state – dependence matrix • nn • Row i – process state for pi • Diagonal – current state intervals

The ModelSystem states (cont.) • S – set of all system states • A=[**]S and B=[**]S • A  B   i=1..n: ii ii • Partial order different than Lamport’s • Orders system states vs. events • Only events are state intervals • Lattice • A  B = [**] ik = ii ii ? ik : ik • A  B = [**] ik = ii ii ? ik : ik

The ModelConsistent System states • Consistent state • All received messages • Sent in the current state of the sender • Can be deterministically sent in the future • Messages not yet received are not a problem • Definition: D=[**]S,  i, k=1..n: ik kk • A process cannot depend on the state interval of another process, that has not been reached yet • C = { D S | D is consistent } • C is a sub-lattice of S – proof straightforward!

The ModelLogging and Stability • logged(i,) • Message that started state interval  of process i has been logged on stable storage • checkpoint(i,) • Exists a check-point that contains the state of process i on stable storage • checkpoint(i,0) is implicit • Effective check-point for  on i is checkpoint(i,),   ,  is maximal • stable(i,)   :  <     [logged(i,)]

The ModelRecoverable System states • Recoverable system state • System state is consistent • All current process states are stable • D=[**]S • recoverable(D)  D C &&  i : stable(i, ii) • R = { D S | recoverable(D) } • R is a sub-lattice of S – proof straightforward! • Theorem: A single maximum recoverable state exists! • Proof • R  S; • A  B R if A, B R A, B  A  B • Therefore maximum is D R D, obviously unique!

The ModelRecoverable System states (cont.) • Current recovery state • The Maximum Recover State at any time • Never decreases • D=[**], No  : ( i :   ii ) is ever rolled back • Proof: • D will always remain consistent • iiwill always remain stable • Since R is a lattice, any new state formed after D will be greater than D • In any new current recovery: • ii  state interval index for each process • Therefore, not state interval   ii for each i will ever need to be rolled back!

The ModelWrapup… • Corollary 1: If all messages received are eventually logged no domino effect occurs • If D=[**] is the current recovery state • Corollary 2: Any messages sent by process i from state   ii may be committed • With i being the effective checkpoint of ii • Corollary 3: All previous checkpoints of process i may be discarded • Corollary 4: All messages that begin state intervals prior to i may be discarded

The AlgorithmOverview • Keep a current recovery state • On each new interval  for some process k becoming stable • Try to improve the current recovery state, such that: • State of process k advances to  • Add more state intervals from other processes to maintain consistency • Succeed if all such included intervals are stable

The AlgorithmBasic implementation • Notation • D=[**]– the current recovery state •  – state interval of process k becoming stable • dk = <*> = < 1, 2, 3, 4,…, n>, j = , 0, 1, … – state of process k (dependence vector) • Algorithm • if ( >kk) { i : ki  i // update row of D while ( i,j : ij >jj ) if (   ij : stable()) //  - an interval for j i : ji  i // update row of D with dj for  else fail}

The AlgorithmSome details • The chosen  should be the minimum stable state interval:   ij • The comparisons ij >jj can be made in any order without affecting the final result • When state interval  of process k becomes stable, the algorithm finds some recoverable D with kk =  • No stable process state interval  that was not suitable should be checked again before advancing the current recovery state • Corollary: When the recovery state advances from some D to D’, the rejected ’s above that need to be rechecked are those with direct dependency on some  on any process i such that ii <  < ii’

The AlgorithmProof of Correctness • The algorithm presented always finds the current recovery state of the system • Only finds recoverable system states • Any such state found is greater • Following the observations stated before, all possible new states are considered • Therefore, the correct one is always found!

The AlgorithmOptimizations & Implementation • Optimization considerations • Keeping work list of rows to update D • Keep only the one with max index • Keeping only the diagonal of D • Implementation • Provided in the paper • Follows everything said till now • Takes advantage of some specifics

Conclusions • General Model and Algorithm • Work for both pessimistic and optimistic protocols • Does not need the generality for the pessimistic case • Optimistic logging is desirable from performance standpoint in low failure environments • Unifies existing approaches to fault tolerance • Check-pointing • Message Logging • Results • Existence of unique maximum recoverable state • Never decreases (progress is being made) • Domino effect cannot occur

Future work list… • Address non-determinism • Switch between • check-pointing for the non-deterministic part • Check-pointing + message logging elsewhere • Output-driven optimistic message logging and check-pointing • Pay attention to communication of the results • Application specific knowledge

Message Logging Pessimistic & Optimistic