250 likes | 394 Vues
The Complexity of Adding Failsafe Fault-tolerance. Sandeep S. Kulkarni Ali Ebnenasir. Motivations. Why automatic addition of fault-tolerance? Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault-tolerance)
E N D
The Complexity of Adding Failsafe Fault-tolerance Sandeep S. Kulkarni Ali Ebnenasir
Motivations • Why automatic addition of fault-tolerance? • Why begin with a fault-intolerant program? • Reuse of the fault-intolerant program • Separation of concerns (functionality vs. fault-tolerance) • Potential to preserve properties such as efficiency • One obstacle • Adding masking fault-tolerance to distributed programs is NP-hard [ FTRTFT, 2000]
Motivation (Continued) • Approach for dealing with complexity • Heuristics [SRDS 2001] • Weaker form of tolerance • Failsafe • Safety only in the presence of faults • Nonmasking • Safety may be temporarily violated • Restricting input • Programs • Specifications
Masking fault-tolerant Failsafe fault-tolerant Nonmasking fault-tolerant Motivation (Continued) • Why failSafe Fault-Tolerance? • Simplify the design of masking • Partial automation of masking fault-tolerance (using TSE’98) Automate Automate Intolerant Program
Outline of the Talk • Problem of adding fault-tolerance • Difficulties caused by distribution • Complexity of failsafe fault-tolerance • Class of programs and specifications for which polynomial synthesis is possible
f p/f p Basic Concepts:Programs and Faults • State space Sp • Program transitions deltap, faults deltaf • Invariant S, fault-span T • Specification spec: Safety is specified by transitions, (sj, sk) that should not be executed T S
Invariant of fault-intolerant program Invariant of fault-tolerant program No new transition here New transitions may be added here Problem Statement • Inputs: program p, Invariant S, Faults f, Specification spec • Outputs: program p’, Invariant S’ • Requirements: Only fault-tolerance is added; no new functional behavior is added
a=1,b=0 a=0,b=0 • Only if we include the transition a=1,b=1 a=0,b=1 Difficulties with Distribution • Read/Write restrictions • Two Boolean variables a and b • Process cannot read b • Can we include the following transition? Groups of transitions (instead of individual transitions) must be chosen.
Included iff x0 is false an = a0 a0 Included iff x0 is true _ cj = xj \/ xk \/ xl Included iff xk is true Included iff xl is false Included iff xj is false Reduction from 3-SAT
Dealing with the Complexity of Adding Failsafe Fault-tolerance • For what class of problems, failsafe fault-tolerance can be added in polynomial time • Restrictions on • Fault-tolerant programs • Specifications • Faults • Our approach for restrictions: • In the absence of faults, preserve all computations of the fault-intolerant program
Restrictions on Programs and Specifications • Monotonicity requirements • Capture the notion that safe assumptions can be made about variables that cannot be read • Focus on specifications and transitions of fault-intolerant programs
Then If x = true x = true s’0 s’1 x = false x = false s0 s1 Does not violate safety Does not violate safety Monotonicity of Specifications • Definition: A specification spec is positive monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same • The value of all other variables in s1 and s’1 are the same
x = false X = false s’0 s’1 x = true x = true s0 s1 Invariant S Monotonicity of Programs • Definition: Program p with invariant S is negative monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same • The value of all other variables in s1 and s’1 are the same
Theorem • Adding failsafe fault-tolerance can be done in polynomial time if either: • Program is negative monotonic, and • Spec is positive monotonic • Or • Program is positive monotonic, and • Spec is negative monotonic • If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP-hard • For many problems, these requirements are easily met
Example: Byzantine Agreement • Processes: General, g, and three non-generals j, k, and l • Variables • d.g : {0, 1} • d.j, d.k, d.l : {0, 1, ┴ } • b.g, b.j, b.k, b.l : {true, false} • f.g, f.j, f.k, f.l : {0, 1} • Fault-intolerant program transitions • d.j = ┴ /\ f.j = 0 d.j := d.g • d.j ≠ ┴ /\ f.j = 0 f.j := 1 • Fault transitions • ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true • b.j d.j,f.j :=0|1,0|1
Example: Byzantine Agreement (Continued) • Safety Specification: • Agreement: No two non-Byzantine non-generals can finalize with different decisions • Validity: If g is not Byzantine, no process can finalize with different decision with respect to g • Read/Write restrictions • Readable variables for process j: • b.j, d.j, f.j • d.g, d.k, d.l • Process j can write • d.j, f.j
Example: Byzantine Agreement (Continued) • Observation 1: • Positive monotonicity of specification with respect to b.j • Observation 2: • Negative monotonicity of program, consisting of the transitions of j, with respect to b.k • Observation 3: • Negative monotonicity of specification with respect to f.j • Observation 4: • Positive monotonicity of program, consisting of the transitions of j, with respect to f.k
Summary • Complexity analysis for failsafe fault-tolerance • Reduction from 3-SAT • Restrictions on specifications and programs for which polynomial synthesis is possible • Several problems fall in this category • Byzantine agreement, consensus, commit, … • Necessity of these restrictions
Future Work • Simplifying the design of masking fault-tolerance using the two-step approach • Refining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitable • Using monotonicity requirements for simplifying masking fault-tolerance
Thank You • Questions?
Future Work • Conclusion • Specifying the boundary • Fault-tolerance addition can be done in polynomial time • Exponential complexity is inevitable • Goal: what problems can benefit from automation? • Necessity and sufficiency of monotonicity requirements • Future Work • How can we Change a non-monotonic program to a monotonic one by modifying its invariant? • How can we Strengthen a non-monotonic specification to a monotonic one? • How a nonmasking program can be designed manually to satisfy monotonicity requirements?
Basic Concepts: Fault-tolerant Program Fault-tolerance in the presence of faults: Failsafe: Satisfies its safety specification Nonmasking: Satisfies its liveness specification (safety may be violated temporarily) Masking: Satisfies safety and liveness specification
The complexity of Adding Failsafe fault-tolerance • Adding (failsafe/nonmasking/masking) fault-tolerance in high atomicity model is in P • Adding masking fault-tolerance to distributed programs is in NP • How about failsafe? • Adding Failsafe to distributed programs is NP-hard!! (proof in the paper) • Reduction of 3-SAT to the problem of failsafe fault-tolerance addition
Our Approach • Stepwise towards masking fault-tolerance: • Automating the addition of failsafe fault-tolerance • How hard is adding failsafe fault-tolerance? • Polynomial time boundaries for failsafe tolerance addition?
Sp’ • Sp,