Fault Diagnosis for Timed Automata

Fault Diagnosis forTimed Automata Stavros Tripakis VERIMAG

Fault diagnosis Plant (event + fault generator) Observable events Diagnoser (event reader) Fault announcements

Assumptions • The plant behaves according to a known model. • The diagnoser receives the (observable) events immediately when they occur. • The diagnoser reacts immediately.

Requirements • The diagnoser does not produce any false positives (announces a fault when no fault occurred). • The diagnoser always announces a fault within a bounded delay after the fault occurred. • Other sanity requirements (diagnoser is causal, does not change its mind, etc).

a f u b a Fault! b Example Blue events are unobservable. Red events are observable. f is the fault event. The plant model The diagnoser

a f u a Not all plants are diagnosable!

Timed fault diagnosis Plant (event + fault generator) Observable events + delays Diagnoser (event reader) Fault announcements

Assumptions • The plant behaves according to a known timed automaton model. • The diagnoser receives the (observable) events immediately when they occur and reacts immediately. • The diagnoser measures delays between two events (i.e., has a timer). It can also set timeouts (in case an event is not observed for a long time).

a f x > 3 x:=0 a u x  2 Fault! y > 2 The diagnoser: a y:=0 y  2 Example of timed diagnosis The plant model is a Timed Automaton (with invariants for urgency) This plant is diagnosable! In this case, the diagnoser can be modeled as a timed automaton. This is not always the case!

“Infinite-clock” diagnoser(example due to Peter Niebert) a a f a b x:=0 x = 1 x  0 x  1 u a x:=0 a b y:=0 x > 1 y < 1 y < 1 • Faulty behaviors: there is some a exactly 1 time unit before the b. • Correct behaviors: either no b, or no a exactly 1 time unit before the b. This plant is diagnosable. However, the diagnoser needs to check whether some a was exactly 1 time unit before the b. To do this, the diagnoser needs an unbounded number of clocks.

Example (2) a f x > 3 x:=0 a u 1 x  2 This plant is NOT diagnosable! After an f or a u, the plant need not perform an a (it can stay forever in state 1).

Formal definitions • Timed behaviorsover some alphabet : •  = o  u a 1.1 u 0.4 b 3 f 2.2 c

Formal definitions • Observablebehaviors: a 1.1 u 0.4 b 3 f 2.2 c Projection to observable events a 1.5 b 5.2 c

Formal definitions • Faultybehavior: contains a fault event. a 1.1 u 0.4 b 3 f 2.2 c faulty a 1.1 u 0.4 b 3 u 2.2 c non-faulty

Formal definitions • T-faultybehavior (T is a delay): • faulty behavior, • at least T time elapses after the first occurrence of the fault. • Examples: • 2-faulty (but not 3-faulty) behaviors: a 1.1 u 0.4 b 3 f 2.5 u a 1.1 u 0.4 b 3 f 1.9 0.1 a 1.1 u 0.4 b 3 f 2.2 u 0 c 0.3 u

Formal definitions • Timed automata: • As usual. • Delays are rationals (to be machine-representable) • No acceptance conditions. • Urgency modeled using state invariants. • Non-zeno run: (infinite) run where time diverges.

 D : (o Q)  {0,1} Diagnosers • A T-diagnoser for a timed automaton A is a function • such that, for every behavior  of A, If  is not faulty, then D( Proj() ) = 0 If  is T-faulty, then D( Proj() ) = 1

Diagnosability • A timed automaton A is called T-diagnosable if there exists a T-diagnoser for it. • A is diagnosable if there exists T such that A is T-diagnosable. • Note: if A is T-diagnosable then it is also (T+1)-diagnosable.

Necessary and sufficient condition for diagnosability • A is T-diagnosable iff • or, equivalently • , ’ .  is T-faulty, ’ is not faulty, and Proj() = Proj(’) , ’ . if  is T-faulty and ’ is not faulty, then Proj()  Proj(’)

Questions • How to test whether a given timed automaton A is diagnosable? • How to find the minimum T such that A is T-diagnosable (but not (T-1)-diagnosable)? • How to build a diagnoser?

Testing diagnosability • Assumption: A is non-zeno. • Make two copies of A, A1 and A2: • Copy/rename states, transitions, clocks, etc. • Copy/rename unobservable events. • Copy but do not rename observable events. • Remove all faulty transitions from A2. • Take the product B of A1 and A2: synchronize on common labels (i.e., observable events). • A is diagnosable iff all faulty runs of B are zeno.

Testing diagnosability • The proof is based on the following facts: • Every run of B corresponds to a pair of runs , ’ of A which have the same projection. • ’ cannot be faulty. • If a TA has a T-faulty run for all T, then it has a non-zeno faulty run.

a a a f1 f2 f x > 3 x1 > 3 x2 > 3 x1:=0 x2:=0 x:=0 a a a u u2 u1 x  2 x1 2 x2 2 Testing diagnosability • Example:

a a f1 f x > 3 x1 > 3 x:=0 x1:=0 a a u1 u x1 2 x  2 Testing diagnosability • Example: x2:=0 a u2 x2 2

f1 u2 a x1:=0 x1 > 3 x2:=0 x2 2 u1 x1 2 u2 u2 a f1 x1 2 u1 x2 2 x2 2 Testing diagnosability • Example:

Testing diagnosability • Example: f1 u2 a x1:=0 x1 > 3 x2:=0 x2 2 u2 f1 x2 2

f1 f1 z:=0 z > T Finding the minimum T for T-diagnosability • Assumption: A is diagnosable. • Take the product B as before. • Take the product of B and the observer automaton below (where T is a parameter). • A is T-diagnosable iff the final state of the observer cannot be reached. • Perform a binary search on T : 0, 1, 2, 4, …etc. Complexity: log(T) reachability checks.

Representing diagnosers • A diagnoser will be represented as a deterministic machine M=(W, w0, F, G, H), where • W is its set of states, • w0 is its initial state, • F : W  o  W (event transition function) • G : W  Q  W (time transition function) • H : W  {0,1} (decision function)

Representing diagnosers  = a 1.5 b 5.2 c • Given an observed behavior, • Feed it to the machine: • w1 = F(w0, a), w2 = G(w1, 1.5), • w3 = F(w2, b), w4 = G(w3, 5.2), • w5 = F(w4, c), • Then apply the decision function to the state where the machine stopped: • D( ) = H(w5).

a f faulty states u b Building a diagnoser • Assumption: the structure of A is such that no discrete state can be reached by both a faulty and a non-faulty path. • Every automaton can be transformed so that it meets the assumption (by doubling, at most, its discrete states). f a u b Not OK OK

Building a diagnoser • Preliminary definition: • Let S be a set of states of the timed automaton A. • ReachUnobs(S, ) = { s‘ |  sS, s reaches s’ via a run of exactly  time units that takes only unobservable actions }. • ReachUnobs() is easily implementable using standard TA model-checking techniques (DBMs, reachability, etc).

Building a diagnoser • W : the set of all possible subsets of states of the timed automaton, • w0 = ReachUnobs({s0}, 0), • F(S, a) = { s’ |  sS, s s’}, • G(S, ) = ReachUnobs(S, ), • H(S) = 0, if  s=(q,v)S such that q is non-faulty, 1, otherwise. That is, the diagnoser works as an on-line state estimator a

Summary • Introduced notions of diagnosers and diagnosability for timed automata. • Necessary and sufficient conditions. • Conditions reduce to finding non-zeno runs on a product automaton: • this can be done efficiently on-the-fly [BTY-RTSS’97]. • Diagnosers can be effectively built.

Future work • Easily extendable to timed automata with acceptance conditions (meaning?). • Represent diagnosers as timed automata (when can this be done?). • Controller synthesis for timed automata based on partial observability of events.

a u f b Acceptance conditions The automaton below is in principle diagnosable: b will eventually happen after f (due to the acceptance condition). However, there is no bound of b happening after f.

Fault Diagnosis for Timed Automata