1 / 15

Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software

Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software. Subhachandra Chandra Peter M. Chen University of Michigan Presentation – Lin Tan. Published in DSN 2000. Hypothesis. Most faults in release applications are transient [Jim Gray86]

mahina
Télécharger la présentation

Whither Generic Recovery from Application Faults? A Fault Study using Open-Source Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Whither Generic Recovery from Application Faults?A Fault Study using Open-Source Software Subhachandra Chandra Peter M. Chen University of Michigan Presentation – Lin Tan Published in DSN 2000

  2. Hypothesis • Most faults in release applications are transient [Jim Gray86] • Transient faults are more difficult to reproduce and to debug • Can generic recovery techniques survive most application faults without using application-specific information?

  3. Methodology • Classify software faults into 3 types • One type: eliminated by generic recovery techniques • How many faults are this type? • Study a subset of faults of 3 applications • Apache – widely used HTTP server • Gnome – desktop environment • MySQL – multi-thread SQL database server • Conclusions

  4. Fixed environment -> deterministic execution • Given a fixed operating environment, a set of concurrent, sequential processes is completely deterministic. [Dijkstra 72]

  5. Software Fault Classification • Environment-independent - Determinstic • Long URL • Environment-dependent • Environment-dependent non-transient (Subjective) • Disk full • Environment-dependent transient (Subjective) • Race condition

  6. Program Operating Environment • Software • Other programs • Kernel • Hardware • ECC errors • Interrupts • Thread scheduler • Timing of workload requests: typing speed • User Input: • part of the program • NOT part of the environment

  7. Selection of Bugs • Apache: 50 bugs out of 5220 bug reports • Severe or critical bugs • Gnome: 45 bugs out of 500 bug reports • Only in core files, libraries, and four commonly used Gnome applications • Apache: 44 bugs out of 5220 messages from mailing list • Serious bugs

  8. Example Bugs • Apache • Long URL causes overflow. • MySQL • Lack of file descrpitors. • Gnome • Race condition between a request for action from an applet and its removal. • Race condition between a image viewer and a property editor.

  9. Results - Apache

  10. Results - Gnome

  11. Results - MySQL

  12. Limitations & Discussions • May differ for other applications • Only 3 applications • Only manually studied reported severe bugs (50/5220, 45/500, 44/44,000) • Use automated tools? • Better to implement a general recovery approach and verify the results.

  13. Limitations & Discussions • Why so few transient faults? • People tend to not report transient bugs? • Ignore occurrence frequency of bugs • More reliable systems have more transient bugs?

  14. Related Work • 5-13%: timing or synchronization related in the MVS OS, the DB2 and IMS DB. [Sullivan91, Sullivan92] • 14%: timing and race conditions in the Tandem GUARDIAN OS. [Lee and Iyer 93] • 29%: transient and could be recovered by the Tandem process-pair. [Lee and Iyer 93]

  15. Conclusions • Classical application-generic recovery techniques, such as process pairs, without application specific information, will NOT be sufficient to enable these applications to survive most software faults.

More Related