The Complexity of Adding Failsafe Fault-tolerance

The Complexity of Adding Failsafe Fault-tolerance Sandeep S. Kulkarni Ali Ebnenasir

Motivations • Why automatic addition of fault-tolerance? • Why begin with a fault-intolerant program? • Reuse of the fault-intolerant program • Separation of concerns (functionality vs. fault-tolerance) • Potential to preserve properties such as efficiency • One obstacle • Adding masking fault-tolerance to distributed programs is NP-hard [ FTRTFT, 2000]

Motivation (Continued) • Approach for dealing with complexity • Heuristics [SRDS 2001] • Weaker form of tolerance • Failsafe • Safety only in the presence of faults • Nonmasking • Safety may be temporarily violated • Restricting input • Programs • Specifications

Masking fault-tolerant Failsafe fault-tolerant Nonmasking fault-tolerant Motivation (Continued) • Why failSafe Fault-Tolerance? • Simplify the design of masking • Partial automation of masking fault-tolerance (using TSE’98) Automate Automate Intolerant Program

Outline of the Talk • Problem of adding fault-tolerance • Difficulties caused by distribution • Complexity of failsafe fault-tolerance • Class of programs and specifications for which polynomial synthesis is possible

f p/f p Basic Concepts:Programs and Faults • State space Sp • Program transitions deltap, faults deltaf • Invariant S, fault-span T • Specification spec: Safety is specified by transitions, (sj, sk) that should not be executed T S

Invariant of fault-intolerant program Invariant of fault-tolerant program No new transition here New transitions may be added here Problem Statement • Inputs: program p, Invariant S, Faults f, Specification spec • Outputs: program p’, Invariant S’ • Requirements: Only fault-tolerance is added; no new functional behavior is added

a=1,b=0 a=0,b=0 • Only if we include the transition a=1,b=1 a=0,b=1 Difficulties with Distribution • Read/Write restrictions • Two Boolean variables a and b • Process cannot read b • Can we include the following transition? Groups of transitions (instead of individual transitions) must be chosen.

Included iff x0 is false an = a0 a0 Included iff x0 is true _ cj = xj \/ xk \/ xl Included iff xk is true Included iff xl is false Included iff xj is false Reduction from 3-SAT

Dealing with the Complexity of Adding Failsafe Fault-tolerance • For what class of problems, failsafe fault-tolerance can be added in polynomial time • Restrictions on • Fault-tolerant programs • Specifications • Faults • Our approach for restrictions: • In the absence of faults, preserve all computations of the fault-intolerant program

Restrictions on Programs and Specifications • Monotonicity requirements • Capture the notion that safe assumptions can be made about variables that cannot be read • Focus on specifications and transitions of fault-intolerant programs

Then If x = true x = true s’0 s’1 x = false x = false s0 s1 Does not violate safety Does not violate safety Monotonicity of Specifications • Definition: A specification spec is positive monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same • The value of all other variables in s1 and s’1 are the same

x = false X = false s’0 s’1 x = true x = true s0 s1 Invariant S Monotonicity of Programs • Definition: Program p with invariant S is negative monotonic with respect to variable x iff: • For every s0, s1, s’0, s’1: • The value of all other variables in s0 and s’0 are the same • The value of all other variables in s1 and s’1 are the same

Theorem • Adding failsafe fault-tolerance can be done in polynomial time if either: • Program is negative monotonic, and • Spec is positive monotonic • Or • Program is positive monotonic, and • Spec is negative monotonic • If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP-hard • For many problems, these requirements are easily met

Example: Byzantine Agreement • Processes: General, g, and three non-generals j, k, and l • Variables • d.g : {0, 1} • d.j, d.k, d.l : {0, 1, ┴ } • b.g, b.j, b.k, b.l : {true, false} • f.g, f.j, f.k, f.l : {0, 1} • Fault-intolerant program transitions • d.j = ┴ /\ f.j = 0 d.j := d.g • d.j ≠ ┴ /\ f.j = 0 f.j := 1 • Fault transitions • ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true • b.j d.j,f.j :=0|1,0|1

Example: Byzantine Agreement (Continued) • Safety Specification: • Agreement: No two non-Byzantine non-generals can finalize with different decisions • Validity: If g is not Byzantine, no process can finalize with different decision with respect to g • Read/Write restrictions • Readable variables for process j: • b.j, d.j, f.j • d.g, d.k, d.l • Process j can write • d.j, f.j

Example: Byzantine Agreement (Continued) • Observation 1: • Positive monotonicity of specification with respect to b.j • Observation 2: • Negative monotonicity of program, consisting of the transitions of j, with respect to b.k • Observation 3: • Negative monotonicity of specification with respect to f.j • Observation 4: • Positive monotonicity of program, consisting of the transitions of j, with respect to f.k

Summary • Complexity analysis for failsafe fault-tolerance • Reduction from 3-SAT • Restrictions on specifications and programs for which polynomial synthesis is possible • Several problems fall in this category • Byzantine agreement, consensus, commit, … • Necessity of these restrictions

Future Work • Simplifying the design of masking fault-tolerance using the two-step approach • Refining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitable • Using monotonicity requirements for simplifying masking fault-tolerance

Thank You • Questions?

Future Work • Conclusion • Specifying the boundary • Fault-tolerance addition can be done in polynomial time • Exponential complexity is inevitable • Goal: what problems can benefit from automation? • Necessity and sufficiency of monotonicity requirements • Future Work • How can we Change a non-monotonic program to a monotonic one by modifying its invariant? • How can we Strengthen a non-monotonic specification to a monotonic one? • How a nonmasking program can be designed manually to satisfy monotonicity requirements?

Basic Concepts: Fault-tolerant Program Fault-tolerance in the presence of faults: Failsafe: Satisfies its safety specification Nonmasking: Satisfies its liveness specification (safety may be violated temporarily) Masking: Satisfies safety and liveness specification

The complexity of Adding Failsafe fault-tolerance • Adding (failsafe/nonmasking/masking) fault-tolerance in high atomicity model is in P • Adding masking fault-tolerance to distributed programs is in NP • How about failsafe? • Adding Failsafe to distributed programs is NP-hard!! (proof in the paper) • Reduction of 3-SAT to the problem of failsafe fault-tolerance addition

Our Approach • Stepwise towards masking fault-tolerance: • Automating the addition of failsafe fault-tolerance • How hard is adding failsafe fault-tolerance? • Polynomial time boundaries for failsafe tolerance addition?

Sp’ • Sp,

The Complexity of Adding Failsafe Fault-tolerance

The Complexity of Adding Failsafe Fault-tolerance

Presentation Transcript

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

The Cost of Fault Tolerance in Multi-Party Communication Complexity

Fault Tolerance

Fault tolerance

The Cost of Fault Tolerance in Multi-Party Communication Complexity

Fault tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance