Locating Causes of Program Failures at Texas State University CS 5393 Software Quality Project

Locating Causes of Program Failures Texas State University CS 5393 Software Quality Project Yin Deng

Topics • Introduction • What is the problem? • Overview of major solutions • A Sample Failure • Case Study • Complexity and other issues • Conclusion • Related Material

Introduction • Locating Causes of Program Failures • Holger Cleve and Andreas Zeller • ICSE 2005, research papers on Fault Localization • Holger Cleve is one of the members in software engineering research group at Saarland University in Germany. • Andreas Zeller is a full professor and the chair of software engineering research group at Saarland University. His research in SE concerns especially the analysis of why large, complex software systems fail to work as they should. http://www.st.cs.uni-sb.de/~cleve/ http://www.st.cs.uni-sb.de/zeller/

What’s the Problem? • Definitions • Failure: A program’s behavior doesn’t satisfy its requirement specification. • Fault / Infection: An incorrect intermediate state that may be entered during program execution. • Failure  Infection  Defect in code, but not vice versa. • Problem • Why does program fail? • How to find the defects that cause a software failure?

Overview of major solutions • Searching in Space • Acrossa program state to find the infected variable(s), often among thousands. • Focus on the difference between the program states where the failure occurs, and the states where the failure does not occur. • Using Delta Debugging, those initial differences can be systematically narrowed down to a small set of variables. • Searching in Time • Search over millions of program states to find the moment when the defect was executed. • Focus on cause transitions (CTS)!

Searching in Space • Compare the program states of a passing run r and a falling run r at a certain moment. • Of alldifferent states, only some may be relevantforthe failure. • How to find a subset of relevant variables that is as smallas possible? • Delta Debugging, which behaves very much like a binary search.

Searching in Time • A cause transition is where a cause originates. It points to program codethat causes the transition and hence the failure. • During transitions, some variables cease to be a failure cause and other variables begin. • Cause transitions are not only good locations for fixes, they actually locate the defects that cause the failure.

26 int main(int argc, char *argv[]) 27 { 28 int i = 0; 29 int *a = NULL; 30 31 a = (int *)malloc((argc - 1) * sizeof(int)); 32 for (i = 0; i < argc - 1; i++) 33 a[i] = atoi(argv[i + 1]); 34 35 shell_sort(a, argc); 36 37 for (i = 0; i < argc - 1; i++) 38 printf("%d ", a[i]); 39 printf("\n"); 40 41 free(a); 42 return 0; 43 } 1 /* sample.c -- Sample C program */ 2 3 #include <stdio.h> 4 #include <stdlib.h> 5 6 static void shell_sort(int a[], int size) 7 { 8 int i, j; 9 int h = 1; 10 do { 11 h = h * 3 + 1; 12 } while (h <= size); 13 do { 14 h /= 3; 15 for (i = h; i < size; i++) 16 { 17 int v = a[i]; 18 for (j = i; j >= h && a[j - h] > v; j -= h) 19 a[j] = a[j - h]; 20 if (i != j) 21 a[j] = v; 22 } 23 } while (h != 1); 24 } 25 Example – Source Code

Example – Running Result • A passing run r $ sample 9 8 7 7 8 9 • A falling run r $ sample 11 14 0 11 What’s wrong?

Example – Searching in Space State differences between rand r. One of these differences causes sample to fail.

Example – Searching in Space (cont.) • Procedures • Runs r up to Line 9 • Applies half of the differences on r • Resumes execution and determines the outcome. • Result • Line 9, a[2] being zero causes the sample failure. • What causes a[2] be zero?

Example – Searching in Time

Example – Searching in Time (cont.) • Procedures • Find an interval of matches to start with; • there was a cause transition between argc in step 1 and a[0] in Step 44; • Use Delta Debugging to find relevant variables between argc and a[0] (function calls are preferred), a[2] is isolated; • CTS : Step 26 (a[2] again); • CTS : Step 35 (v). • Result • argc  a[2] in Lines 32–35 (Steps 8–11); • a[2]  v in Line 17 (Step 29); • v a[0] in Line 21 (Step 36).

Example – Debugging Result

Case Study: The GCC Failure The program that crashes GCC

Complexity • Searching in space • Best case: Delta Debugging needs 2s log ktest runs to isolate sfailure-inducing variables from kstate differences. • worst case is k2 + 3k • In practice, Delta Debugging is much more logarithmic than linear. • Searching in time • A simple binary search over nprogram steps, repeated for each cause transition. • For mcause transitions, we need m log nruns of Delta Debugging.

Practical Issues • Accessing state • Currently using GDB, which is painfully slow; • More efficient ways need to be explored. • Capturing accurate states • Several heuristics are used to determine state transferring; • When such heuristics fail, the state cannot be transferred. • Incomparable states • When control flow reaches different points in r and r, the resulting states are not comparable, simply because the set of local variables is different. • Some efforts are required to determine when the control flows of r and r diverge and converge.

Conclusion • Cause transitions locate the software defect that causes a given failure, performing twice as well as any other technique previously known. • The technique requires an automated test, a mean to observe and manipulate the program state, as well as at least one alternate passing test run. • The technique could be used as an add-on to running an automated test suite; we not only know thata test has failed, but also whyand whereit failed.

Related Material • Isolating cause-effect chains from computer programs. • A. Zeller. In W. G. Griswold, editor, Proc. Tenth ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-10), pages 1–10, Charleston, South Carolina, Nov. 2002. ACM Press. • Simplifying and isolating failure-inducing input. • A. Zeller and R. Hildebrandt. IEEE Transactions on Software Engineering, 28(2):183–200, Feb. 2002. • Visualizing memory graphs. • T. Zimmermann and A. Zeller. In S. Diehl, editor, Proc. of the International Dagstuhl Seminar on Software Visualization, volume 2269 of Lecture Notes in Computer Science, pages 191–204, Dagstuhl, Germany, May 2002. Springer-Verlag. • Why Programs Fail: A Guide to Systematic Debugging. • A. Zeller. Morgan Kaufmann Publisher, October, 2005. • ISBN 1558608664.

Any Question?

Locating Causes of Program Failures at Texas State University CS 5393 Software Quality Project

Locating Causes of Program Failures at Texas State University CS 5393 Software Quality Project

Presentation Transcript

Turbo Failures: Causes, Analysis, Prevention

CAUSES OF CONTRACTORS’ FAILURES IN SAUDI ARABIA

‘Locating’ localism

Locating Theories

FAILURES

Market Failures

FAILURES

Failures And causes nasa missions

Locating Resources:

Locating Tests

FAILURES

Determining the Causes of AccuVote Optical Scan Voting Terminal Memory Card Failures

Software causes many failures - significant mission risk

NCD-Causes of causes

Locating Resources

Locating

Treatment of Intertie Failures

Failures

Locating Information

Locating Places

Determining the Causes of AccuVote Optical Scan Voting Terminal Memory Card Failures

Consequences of Ethical Failures