80 likes | 187 Vues
This study, conducted in November 2009 by Anders P. Ravn at Aalborg University, delves into dynamic redundancy, error detection, damage confinement, and error recovery methods. Topics include fault treatment and continued service, fault trees, and fault tolerance basis.
E N D
Tolerating Timing faults TSW November 2009 Anders P. Ravn Aalborg University
Retry ... ... Try Try Try FT basis: Redundancy • Time • Space Try Retry BW 2.5 p. 41
Dynamic Redundancy • Error detection • Damage confinement and assessment • Error recovery • Fault treatment and continued service BW p. 41
D Error Detection f: State x Input State x Output • Environment (exception) • Application • Assertion: • precondition (input,state) • postcondition (input, state, state’, output) • invariant(state, state’) • Timing: • WCET(f, input) • Deadline (f,input) BW Ch 13
Fault Tree Missed D_i Platform fails EC_i > C_i EI_i > I_i ET_i < T_i EB_i < B_i EC_k > C_k ET_k < T_k
Error Detection • Deadline D missed (Platform Error) • Overrun of C • Min. Interarrival time T too small • Blocking time B too small
Damage Confinement • Static structure one task lower priority tasks ? • Dynamic structure BW p. 457
Error Recovery • Forward • Backward Repair the state – if you can ! • define recovery points • checkpoint state at r. p. • roll back • retry Domino effect