130 likes | 154 Vues
Understanding system failures, analyzing risks, fault tree analysis, and human error to anticipate and plan for unavoidable risks. Techniques include fault tree analysis, WB-Graphs, and finding causal relationships in failures. Managing requirements changes and implementing effective change management processes.
E N D
Failure Analysis Requirements Maintenance
Anticipating Failure • We cannot engineer away all possible failures • System only has partial control over its environment • System is made up of components which may themselves fail (especially Web services!) • Unavoidable risks must be anticipated and planned for • Estimate both likelihood and severity (= expected cost) • Choose to ignore or plan for
Assessing Risks • Traditional approach • Failure is a system state • Logically analyse events leading to failure state • Assume that failure is catastrophic; remains until repair action is taken • Special characteristics of software failure • Not necessarily a bad state • May be incorrect sequence of events instead • Non-catastrophic • Example: user inserts coins into vending machine, gets item, but no change given
Kinds of Risk • Failure may be due to interaction between human and machine • Example: Therac-25 • Software error: backspace not registered correctly • User error: typo • Usability error: value not presented for confirmation • End result: too much radiation!
Fault Tree Analysis • Another kind of AND/OR tree • Single root: the failure state, and severity estimate • Leaf events labelled with probabilities • Probabilities propagated upward • Be careful of independence! • Can be structured hierarchically • Make a leaf the root of its own tree
Caveat Ratiocinator • .. let the analyst beware! • Probabilities for component failure are easy to determine (given, say, MTBF) • Probabilities for operator error must be based on statistics, and may depend on many factors • Work environment, experience, ... • Meaningless probabilities give meaningless analysis
Why-Because Analysis • Ignore probability, focus on causal relationships only • Flow of events in time is more explicit • “Why is some goal not fulfilled?” • Meant to deal with open, complex, heterogeneous systems • Open: significant effect by environment • Heterogeneous: different types of components (digital, analog, human, business logic)
Example: Lufthansa A320 Accident, Warsaw, 1993 • Craft landed at Warsaw Airport during a thunderstorm • None of the braking systems worked for 9 seconds • Aircraft ran off the end of runway • Collided with an earth bank, caught fire: 2 deaths • Initial report cited only failure of braking systems as a cause • But presence of the earth bank was an original cause! • That is: no other factor contributed to its presence
WB-Graphs • Shows cause-effect relations of all states and events contributing to a failure • Two steps: • List all events and states of significance (empirical) • Determine causal relations • A causes B if, in the nearest 'possible world' where A did not happen, B did not happen • For instance: 'my office door is closed, because I closed it'. • It is imaginable that, had I not closed it, some other cause (wind, another person) would have; but that is not in the nearest possible alternate reality
Finding All the Facts • The Method of Difference • Let F be a significant fact • How would behaviour have been different if F were not the case? Call this (contrafactual) behaviour B. • What is the first place where B differs from the actual behaviour? • This event or state contains a causal factor of F • Try to identify it, and label it with G; and continue
Analyzing Human Actions • Sequence of stages in human decision-making is abbreviated PARDIA • Perception • Attention • Reasoning • Decision • Intention • Action • Human error may occur at any of these stages • System design flaws can be identified as contributing factors to human error
Three Responses to Requirements Change • Add new requirements during development • But avoid 'feature creep' • Modify requirements during development • Prototypes help discover necessity • Remove requirements during development • As feasibility or business importance drops
Elements of Change Management • Configuration Items • Each configuration item is a distinct product during development • Has its own requirements and version control • Baselines • Stable version of a document for sharing • Change Management Process • All proposed changes are submitted as change requests • A review board reviews them periodically, considers interactions • If agreed, becomes part of next baseline