1 / 19

Reliability engineering

Reliability engineering. William W. McMillan. 28 March 2013. In non-functional requirements, what are some of the reliability targets that might be defined?. General Approaches. Avoiding faults Develop in a way to prevent faults. Careful specification and programming. Detecting faults

avi
Télécharger la présentation

Reliability engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability engineering William W. McMillan 28 March 2013

  2. In non-functional requirements, what are some of the reliability targets that might be defined?

  3. General Approaches • Avoiding faults • Develop in a way to prevent faults. • Careful specification and programming. • Detecting faults • Formal verification • Extensive testing • Tolerating faults • Run-time response to faults • Recover and proceed

  4. Diminishing Returns • Cost to catch each error goes up dramatically as more and more are caught. • Considered impossible to catch all errors. • Especially in systems with complex interactions among modules, with hardware, or between threads. • “Six Sigma” aims at 3.4 defects in 1 million items. • From Motorola, used by GE and others. • Spec limit is 6 SDs away from mean of measure. • E.g., Spec is 1000 ± 0.6; If mean is 1000, SD < 0.1 • Still not perfect!

  5. What Six Sigma goal could be defined for software reliability?

  6. Redundancy • Multiple versions of the software. • N-version programming • Different developers • Different languages and libraries • Installations on multiple hardware platforms. • Multiple methods to verify software. • Multiple sets of eyes on code.

  7. How would you use redundancy in creating software to set off water sprinklers for fire suppression?

  8. Observation • Process • Documented, archived, standardized • Monitoring at runtime • Performance: time, space, transmission rates • Inconsistencies between version or measures • Deadlocks • Memory access problems • Failure of assertions • State of hardware • Keep a trace.

  9. Runtime Recovery • Exception handling is critical. • Record state and problem. • Run diagnostic routines. • Reset hardware. • Return to functional state. • Might have different versions “vote.” • Can sometimes reduce performance and still do job. • Slow down data transmission. • Throw away some packets. • Disable some functions.

  10. Backup or Protection System • Runs in parallel with primary system. • Simpler than primary system. • Monitors sensors (possibly alternate ones), performance, etc. • Can intervene to: • Shut something down. • Start emergency actions (fire suppression, brakes, alarms…). • Take control from primary to get into safe state.

  11. What kinds of systems could not function well with degraded performance?

  12. Programming Practices • Validate data. • Range checks • Consistency checks • E.g., Car in “park” is not going 50 mph. • Encapsulate. • Use good languages • Object-oriented design or similar • Private data • Simple interfaces

  13. Programming Practices • Control memory access • Array bounds • Pointers • Handle exceptions • Throw specific exception types and info. • Use assertions • Throw exception when one fails. • Time out when waiting for resource. • Install switches for debug mode, audit trails.

  14. Programming Practices • Check versions of other components. • Define hierarchy of hardware needed. • Alternate ports, sensors, actuators,… • Alternate storage devices • Move to another if there’s a problem. • Make UI bulletproof • Consistency • Data types and ranges • Keep in sandbox

  15. Programming Practices • Beware of recursion. • Can be inefficient. • Can blow the stack. • Beware of interrupts. • Device might send interrupt and halt a time-critical operation. • Program should have a plan for full data structure. • Buffer • Disk file

  16. Think of a language that would not support these programming practices well. How would you use that language so as to overcome its deficiencies?

  17. Measures of Reliability • Mean time between failures • Probability of failure on demand • When service requested, how often given? • Percent time available • E.g., web services • Percent of completed operations • Initiated by the program, e.g., • Step of motor, writing to port, saving data item,…

  18. Measures of Reliability • Percent of data acquired • E.g., reading from stream, how many values lost? • Average quality of data • E.g., video • Percent time that status bits are not as expected. • …?

  19. Think of some other reliability measures that might be useful.

More Related