Simplifying Failure: Finding Bugs in Programs Effectively

Vikas , Purdue Simplifying failure Inducing input

Problem • when testing a program • some test case fails • how to find the bug ?? • problems!!! • test input is huge – 800 lines say • do a binary search manually ? • how much time ?

Problem - example • Mozilla web browser • 1999 • bugzilla listed more than 370 bug reports • faced imminent doom !! • opened a program 'Bug a Thorn' • who ever finds bugs will get a prize • Finding bug is a SERIOUS PROBLEM

Problem – example • one of the bug • Mozilla crashed after 95 user actions • It was not able to print a particular html page of 895 lines • simplified to 3 user actions • html page simplified to 1 line • Now – 895 lines of input simplified to just 1 line !!

How to solve? • “Delta Debugging algorithm“ • take a failing test case • and take a passing test case • simplify tat and produce a minimal test case • the simplified test case still produces the failure

Example - Continued

Conflicting issues • Decomposing a specific bug report into simple test case • A bug report should be as specific as possible • on the other hand test case should be as simple as possible • Test case simplification does both! • allows for short problem descriptions • subsumes all details in bug report

Delta Debugging – how • Define what is a successful test case • Feed with a failing test case • Ddmin simplifies it by successive testing • Stop when a minimal test case is reached • Now removing any single input entity would cause the failure to disappear

Analogous example-flight simulation • problem – flight crashes few seconds after take off , how do we find the bug?? • repeat the situation over again and again under changed circumstances • find out what is relevant and what is not relevant • eg , leave the passenger seats – still crashes • eg , leave the coffee vending machine – still crashes ! • hence both of them are irrelevant

Example continued - DDMin

DDMin • Not only minimises the failing input • Also maximises the the passing input • Not only limited to html input, character input nor to program input • can be applied to all circumstances that can make a program crash or those which will affect the program execution

input

Assumptions – reasons for failure • program code • data from storage or input devices • the programs environment • the specific hardware • the operating system • “All the above are called circumstances”

Changes that cause failure • We are interested only in the changeble circumstances • These changeable circumstances make up the program input in most of the cases

Definitions

What is the change ?? ( delta ) - decomposition? • No specific way to get the changes . • html example • delta can be a single character • can be a single tag • can be a single line also • HOW TO DECOMPOSE THEM??

Definitions – composition of changes

Definitions - Test cases and tests • According to POSIX starndart for testing • the test can succeed • the test can fail • the test produce intermediate result • We need a function 'rtest' that takes a program run and gives one of the above output.

Definitions – test case and test

Test cases

Minimizing test case

Minimizing algorithm

The Algorithm

Complexity of DDMin • Complexity of DDMin is Cx2 + 3Cx • Worst case 2 phases • 1. When every test has an unresolved input • then we go till the maximum granularity of Cx • # of tests to be carried out is 2+4+8+.... + 2 Cx • ~= 4Cx • 2. When testing the last complement fails (∆n) • results in Cx-1 calls • Total : 2(Cx-1)+2(Cx-2)+.....+2=~Cx2-Cx • Add up everything :- 4Cx + Cx2 - Cx

Minimizing algorithm 2 8 16

Case Study1 – GCC gets a fatal signal! - run in WYNOT • a program bug.c crashes when compiled with gcc • but the program crashes only with some optimization options given • does not crash with all optimizations enabled! • code is 755 characters – each character a component ( hence , may have a lot of useless C code )

Case study 1 size of Z = 1 Z[1] will segfault

Minimizing the test case 177 100 77

Minimizing Gcc options gcc -o -fforce-addr bug.c

Case study – Mozilla crashing • One of the bug report in Mozilla firefox • Following operations cause Mozilla to crash • Start Mozilla • Go to bugzilla.mozilla.org • Print to file setting the margins to .50 • Once its done printing do the exact same thing on the same file ( /var/tmp/netscape.ps ) • This causes browser to crash with a segfault

Mozilla crashes • Mozilla input consists of two items • 1 . The sequence of input events • ie the succession of mouse motions • pressed keys and clicked buttons • used XLAB to capture – 711 actions • 2. HTML code of the erroneous www page

Mozilla crashes • out of 711 actions – only 95 were user actions , rest were notifications by X server • out of 95 user actions only 3 are left after 82 test runs . • Invoke Print dialog • Press mouse button 1 on the print button • release mouse button 1

Mozilla crashes 95 user actions 82 runs , 3 user actions

Mozilla crashes – excerpt of input

Mozilla crashes – sample run

Mozilla crashes - run 896 lines !!! 58 runs , 1 line

Mozilla crashes ! - 1 line • <SELECT_NAME=”priority”_MULTIPLE_SIZE=7> - is the culprit • or its just <SELECT> • in other words , the bug report is now just • Create an HTML page containing “<SELECT>” • Load the page and print it using Alt+P command • the browser crashes with a segmentation fault • or – printing the <SELECT> crashes !!

Minimizing fuzz • bart Miller and his team examined the robustness of UNIX utilities by sending fuzz input ( a huge number of random characters ) • found that 40% of the basic programs crashed or went into infinite loops • ddmin algorithm was tested on fuzz input sequences • for NROFF, TROFF, GLEX, CRTPLOT , UL,UNITS

Minimizing fuzz

Simplifying failure inducing input • The 3 case studies discussed show that • the larger the size of the simplified input , the higher is the number of tests required • because determining 1-minimality of a test case with n entities req atleast n tests • because each individual entity is removed and tested • for flex , the number of tests vary upto 104 for low precision to 36,000 for high precision .

other approaches ? • Simply stop the process when a certain time limit is reached • Simply stop the process when the input test case is reduced by a certain extent • Better approach is 'Isolation' • Find one relevant part of the test case .removing this particular part makes the failure go away. • Simplifying meant that – the simplified test case had all the relevant parts

Isolating example 7 tests!!

Simplyfying 26 tests

Isolation

Future work • Domain specific methods • knowledge about the input structure can very much enhance the performance • for ex – valid program inputs are described by grammars , would be nice to rely on such grammars • can exclude syntactically invalid inputs

Simplifying Failure: Finding Bugs in Programs Effectively

Simplifying Failure: Finding Bugs in Programs Effectively

Presentation Transcript

Simplifying and Isolating Failure-Inducing Input

Simplifying Fractions

Simplifying Fractions

SImplifyING

Tear-Inducing Bacteria

Simplifying Radicals

Inducing Relations

Simplifying Expressions

Inducing Particle

Simplifying

Inducing Apoptosis in Cancer

Inducing Structure for Perception

Simplifying and Isolating Failure-Inducing Input

Simplifying Expressions

Inducing Breach of Contract

Simplifying Radicals

Simplifying

SIMPLIFYING EVALUATION

Simplifying

Simplifying

Simplifying

Sea Ice

Sea Ice