Bug Isolation via Remote Program Sampling

Bug Isolation viaRemote Program Sampling Ben Liblit, Alex Aiken Alice X. Zheng, Michael I. Jordan UC Berkeley Presented by Chao Liu

Debugging is Hard • Limited Resource • Time • Human Efforts • Test Cases • Triage and Guesswork • Windows 2000, 35M LOC, 63,000 known bugs at the time of release, 2 per 1000 lines • --Quoted from Monica Lam’s Slides

€ ƒ ‚ ƒ € Leverage End-users Predicates ShippingApplication ProgramSource Sampler Compiler StatisticalDebugging Counts& J/L Top bugs withlikely causes Courtesy of Ben Liblit

Outline • Low-overhead Sampling • Bug Isolation • Related Works • Conclusion and Discussion

Low-overhead Sampling • Program predicate • Any proposition • Fingerprint of execution • Straightforward Checking

Periodical Countdown counter = 100 while ( … ){ if(--counter == 0){ check(p != Null); counter = 100; } p = p->next; if(--counter == 0){ check(i < max) ; counter = 100; } total += size[i]; }

Randomize it! • No free lunch

From Bernoulli to Geometric • Randomized • Fair • Low-overhead

Bug Isolation • Assumptions • Predicates capture incorrect behavior. • Each predicate P should always be false during correct execution. • Therefore, when P is true, the program • either fails (a deterministic bug) • or is at increased risk of failing (a nondeterministic bug).

Isolating Deterministic Bug • Winnowing Strategy • Predicates observed true on some bad runs • Predicates never observed true on any good run • Case Study: ccrypt • Instrument scalar return sites, 570 • 3 × 570 = 1710 counters • Simulate large user community • 2990 randomized runs; 88 crashes

Winnowing • 1710 counters • 1569 are always zero • 141 remain • 139 are nonzero on some successful run • Not much left! file_exists() > 0 xreadline() == 0 Courtesy of Ben Liblit

Non-deterministic Bug • Logistic Regression

Maximum Likelihood Estimation Maximize the log-likelihood function where

Regularized Logistic Regression Maximize the penalized log-likelihood function where

Case Study: bc-1.06 void more_arrays () { old_count = a_count; a_count += STORE_INCR; /* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx]; /* Initialize the new elements. */ for (; indx < v_count; indx++) arrays[indx] = NULL; … } #1: indx > scale #1: indx > scale #2: indx > use_math #1: indx > scale #2: indx > use_math #3: indx > opterr #4: indx > next_func #5: indx > i_base Courtesy of Ben Liblit

Bug Found: Buffer Overrun void more_arrays () { old_count = a_count; a_count += STORE_INCR; /* Copy the old arrays. */ for (indx = 1; indx < old_count; indx++) arrays[indx] = old_ary[indx]; /* Initialize the new elements. */ for (; indx < v_count; indx++) arrays[indx] = NULL; … } Courtesy of Ben Liblit

Related Work • Fault Localization • Program spectra-based • NN/Perm [RR03], ASE’03 • Memory graph-based • Delta-Debugging [Z02], FSE’02 • Cause-Transition (CT) [CZ05], ICSE’05 • Predicate-based • Liblit03 [LA+03], PLDI’03 • Liblit05 [LN+05], PLDI’05 • SOBER [LY+05], FSE’05 • …

Quality Comparison • CT vs. NN/Perm [CZ05]

Shameless Advertisement [LX+05]

Conclusions • Fault localization is possible • Semantic bugs can be also localized • Intense competition in this problem

Discussion • How many of you believe in the applicability of fault localization • Industry use, … • Personal use, … • Is low, say less than 10%, overhead acceptable to you?

References • [RR03] M. Renieris and S. Reiss. Fault Localization with nearest neighbor queries. In Proc. 18th IEEE Int. Conf. Automated Software Engineering (ASE’03), 2003. • [CZ05] H. Cleve and A. Zeller. Locating causes of program failures. In Proc. 27th Int. Conf. Software Engineering (ICSE’05), 2005. • [LN+05] B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In Proc. ACM SIGPLAN 2005 Int. Conf. Programming Language Design and Implementation (PLDI’05), 2005. • [LA+03] B. Liblit, A. Aiken, A. Zheng, and M. Jordan. Bug isolation via remote program sampling. In Proc. ACM SIGPLAN 2003 Int. Conf. Programming Language Design and Implementation (PLDI’03), pp. 141–154, 2003. • [Z02] A. Zeller. Isolating cause-effect chains from computer programs. In Proc. ACM 10th Int. Symp. Foundations of Software Engineering (FSE’02), 2002. • [LY+05] C. Liu, X. Yan, L. Fei, J. Han and S. Midkiff, SOBER: Statistical Model-based bug Localization. In Proc. ACM 13th Int. Symp. Foundations of Software Engineering (FSE’05), 2005.

Bug Isolation via Remote Program Sampling