Fault-Tolerant Computing Systems #6 Network Reliability

Fault-Tolerant Computing Systems#6Network Reliability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th

Network • Network is made up of network component • Network component • Nodes • Links (arcs, edges) • connecting by HW or software component • States of Network component • Operational • Failed

Network Reliability • Problems • Input： Probability that each component can operates normally • Output：Network Reliability • Network Model • Undirected graphG = (V, E) (V=vertices, E=edges) • Edge：operational or failed • Pe = Pr [edge e is operational] = reliability of e • Unnecessary to think about time (=availability)

Fault Model Pe = Pr [edge e is operational] = reliability of e v2 pa =0.9 pb =0.8 pc =0.9 pd =0.9 pe =0.95 a b e v4 v1 c d v3 Situation of Network …

Network Reliability K = set of nodes V = all nodes • k-terminal reliability • Probability that there exist operating paths between every pair of nodes in K • Two terminal reliability • Probability that there exist operating path between 2 nodes (|K| = 2) • All terminal reliability • Probability that there exist operating paths between all nodes (K=V)

Minpaths • Pathset • A set of components (edges) whose operation implies (guarantees) system operation • Minpath • A minimal Pathset • Ex．K={v1,v4} v4 v1

Mincuts • Cutset • A set of components (edges) whose failure implies (guarantees) system failure • Mincut • A minimal Cutset • Ex．K={v1,v4} v4 v1

Computation of Reliability • Complexity for two-terminal reliability and all terminal reliability • NP-hard (#P-complete) • Algorithms • Efficient Algorithms for Restricted Classes • Exponential time algorithm for general networks

Transformations and Reductions R(G) = (multiplicative factor) * R(G’) • G’ = contraction of G • R(G) = reliability of G • R(G’) = reliability of G’ • Contraction • G, G’ = (contraction of G, G•e) • Multiplicative factor = pe • When e is mandatory (mandatory = an edge that appears in every minpath) u v G: G’: e u (= v)

Transformations and Reductions • Parallel Reduction • G, G’ • Multiplicative factor = 1 • Series Reduction • G, G’ • Multiplicative factor = 1 p1 1- (1- p1) (1- p2) G: G’: p2 p1 p2 p1 p2 G: G’:

Series-Parallel Graphs • A graph that can be contracted to one edge by using Series and Parallel Replacement • Series Replacement • Parallel Replacement • There exists that algorithm to calculate K-terminal reliability in polynomial time.

An Example Series Replacement Parallel Replacement

An Example 1-(1-pe)(1- pbpd) pb pbpd pa pa pa pa pe pe pc pc pd pc pc(1-(1-pe) (1- pbpd)) p1 p2 p1 p2 p1 1- (1- p1) (1- p2) 1-(1-pa)(1-pc(1-(1-pe)(1- pbpd))) p2

pb pa pe pd pc Factoring • A Naïve approach • Reliability calculation costs too much. … papbpc pd pe +(1-pa)pbpc pd pe + pa(1-pb)pc pd pe + …

Factoring • Concept • Select one edge (e) • R(G) = pe*R(G•e)+(1-pe)*R(G-e) G•e = graph obtained by contracting edge e in G G-e = graph obtained by deleting edge e in G G•e • When G − e is failed, any sequence of contractions and deletions results in a failed network • Hence there is no need to factor G − e. G e G-e

Quiz v2 Minpaths of the system that 3 connected nodes are operating normally a b e v4 v1 c d v3

Review of Fault Tree

Failure 2-out-of-3 Si=kn( )F(t)i(1-F(t))n-i n i S1 S2 S3 Fault Trees Pictorial representation of the combination of events that can cause the occurrence of an undesirable event (failure). • Staring point (of tree) is the definition of a single undesirable event (failure). • An event is reduced to a combination of low-level events by means of logic gates. F(t) = probability of the occurrence of failure event (function of F(t) is CDF) 0: Normal, 1: Fails TMR OR gate AND gate k-out-of-n

Failure or and and P2 M1 P1 M2 M3 Fault Tree Model & Reliability Block Model • Reliability Block Model • The structure that shows when the system is functioning. • Fault Tree Model • The structure that shows when the system has failed • The output of the top event is a logic 1 Fault Tree Reliability block diagram 0: Normal, 1: Fails 2 processors (P) 3 memory module (M)

Example of Fault Tree 0: Normal, 1: Fails Failure F1(t)*(1-(1-F2(t))*(1-F3(t))) and 1-(1-F2(t))*(1-F3(t)) or S1 S2 S3

Failure or and and P2 M1 P1 M2 M3 Fault Trees (Basic) • Fault Tree when there is no repeated component • Failure distribution for the component is independent Reliability block diagram Fault Tree F(t) = ?? 2 processors (P) 3 memory module (M) The system is operational if at least one processor and one memory module are operational.

Failure and and and or or P1 P2 M1 M3 M2 M3 Fault Trees (Advance) • Fault Tree when there is repeated component • Failure distribution for the component is not independent Suppose that instead of all three memory modules being shared between the two processors, one of the memory modules (M3) is shared and the other two are private, one for each processor. What it reliability block diagram ? We have to use factoring technique !

and and and and and Failure P1 P2 Factoring M3 has failed Fa(t) Failure 0: Normal, 1: Fails F(t) or or Failure P1 P2 M1 M2 or or M3 has not failed Fb(t) P1 P2 M1 M3 M2 M3 F(t) = FM3(t)*Fa(t) + (1-FM3(t))*Fb(t) Multiply the result for each case by the probability that case happens, then add the products.

Fault-Tolerant Computing Systems #6 Network Reliability

Fault-Tolerant Computing Systems #6 Network Reliability

Presentation Transcript

FAULT-TOLERANT COMPUTING

FAULT-TOLERANT COMPUTING

Fault-Tolerant Computing Basics

Fault-Tolerant Computing Systems #1 Introduction

Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance

Fault-tolerant Computing

Fault-Tolerant Computing Basics