An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering

An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering Michael R. Lyu, Zubin Huang, Sam Sze, Xia Cai The Chinese University of Hong Kong

Outline • Introduction • Motivation • Project Descriptions and Experimental Procedure • Static Analysis of Mutants: Fault Classification and Distribution • Dynamic Analysis of Mutants: Effects on Software Testing and Fault Tolerance • Software Testing using Domain Analysis • Conclusion

Introduction • Fault removal and fault tolerance are two major approaches in software reliability engineering • Software testing is the main fault removal technique • Data flow coverage testing • Mutation testing • The main fault tolerance technique is software design diversity • Recovery blocks • N-version programming • N self-checking programming

Introduction • Conclusive evidence abut the relationship between test coverage and software reliability is still lacking • Mutants with hypothetical faults are either too easily killed, or too hard to be activated • The effectiveness of design diversity heavily depends on the failure correlation among the multiple program versions, which remains a debatable research issue.

Motivation • The lack of real world project data for investigation on software testing and fault tolerance techniques • The lack of comprehensive analysis and evaluation on software testing and fault tolerance together

Our Contribution • Conduct a real-world project to engage multiple teams for independent development program versions • Perform detailed experimentation to study the nature, source, type, detectability and effect of faults uncovered in the versions • Apply mutation testing with real faults and investigate data flow coverage, mutation coverage, and design diversity for fault coverage • Examine different hypotheses on software testing and fault tolerance schemes • Employ a new software test case generation technique based on domain analysis approach and evaluated its effectiveness

Project descriptions • In spring of 2002, 34 teams are formed to develop a critical industry application for a 12-week long project in a software engineering course • Each team composed of 4 senior-level undergraduate students with computer science major from the Chinese University of Hong Kong

Project descriptions • The RSDIMU project • Redundatn Strapped-Down Inertial Measurement Unit RSDIMU System Data Flow Diagram

Software development procedure • Initial design document ( 3 weeks) • Final design document (3 weeks) • Initial code (1.5 weeks) • Code passing unit test (2 weeks) • Code passing integration test (1 weeks) • Code passing acceptance test (1.5 weeks)

Program metrics

Mutant creation • Revision control was applied in the project and code changes were analyzed • Fault found during each stage were also identified and injected into the final program of each version to create mutants • Each mutant contains one design or programming fault • 426 mutants were created for 21 program versions

Setup of evaluation test • ATAC tool was employed to analyze the compare testing coverage • 1200 test cases were exercised on 426 mutants • All the resulting failures from each mutant were analyzed, their coverage measured, and cross-mutant failure results compared • 60 Sun machines running Solaris were involved in the test, one cycle took 30 hours and a total of 1.6 million files around 20GB were generated

Static analysis: fault classificaiton and distribution • Mutant defect type distribution • Mutant qualifier distribution • Mutant severity distribution • Fault distribution over development stage • Mutant effect code lines

Static Analysis result (1) Qualifier Distribution Defect Type Distribution

Static Analysis result (2) Severity Distribution

Static Analysis result (3) Development Stage Distribution Fault Effect Code Lines

Dynamic analysis of mutants • Software testing related • Effectiveness of code coverage • Test case contribution: test coverage vs. mutant coverage • Finding non-redundant set of test cases • Software fault tolerance related • Relationship between mutants • Relationship between the programs with mutants

Test case description

Fault Detection Related to Changes of Test Coverage

Relations between Numbers of Mutants against Effective Percentage of Coverage

Test Case Contribution on Program Coverage

Percentage of Test Case Coverage

Test Case Contributions on Mutant Average: 248 (58.22%) Maximum: 334 (78.40%) Minimum: 163 (38.26%)

Non-redundant Set of Test Cases Gray: redundant test cases (502/1200) Black: non-redundant test cases (698/1200) Reduction: 58.2%

Mutants Relationship Related mutants: two mutants have the same success/failure result on the 1200-bit binary string Similar mutants: two mutants have the same binary string and with the same erroneous output variables Related mutants: two mutants have the same binary string with the same erroneous output variables, and erroneous output values are exactly the same

Program Versions with Similar Mutants

Program Versions with Exact Mutants

Relationship between the Programs with Exact Mutants Exact Pair : Versions 4 and 8 Exact Fault Pair 3: Versions 15 and 33 Exact Fault Pair 2: Versions 12 and 31

Relationship between the Programs with Exact Mutants Exact Fault Pairs: Versions 4, 15 and 17 Exact Fault Pair 7: Versions 31 and 32

Software Testing using Domain Analysis • A new approach has been proposed to generate test cases based on domain analysis of specifications and programs • The differences of functional domain and operational domain are examined by analyzing the set of boundary conditions • Test cases are designed by verifying the overlaps of operational domain and functional domain to locate the faults resulting from the discrepancies between these two domains • 90 new test cases are developed, and all the 426 mutants can be killed by these test cases

Test cases generated by domain analysis

Contribution of Test Cases Generated by Domain Analysis Average: 183 (42.96%) Maximum: 223 (52.35%) Minimum: 139 (32.63%)

Non-redundant Test Set for Test Cases Generated by the Domain Analysis 1 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900

Observation • Coverage measures and mutation scores cannot be evaluated in isolation, and an effective mechanism to distinguish related faults is critical • A good test case should be characterized not only by its ability to detect more faults, but also by its ability to detect faults which are not detected by other test cases in the same test set • Domain analysis is an effective approach to generating test cases

Observation • Individual fault detection capability of each test case in a test set does not represent the overall capability of the test set to cover more faults, diversity natures of the test cases are more important • Design diversity involving multiple program versions can be an effective solution for software reliability engineering, since the portion of program versions with exact faults is very small • Software fault removal and fault tolerance are complementary rather than competitive, yet the quantitative tradeoff between the two remains a research issue

Conclusion • We perform an empirical investigation on evaluating fault removal and fault tolerance issues as software reliability engineering techniques • Mutation testing was applied with real faults • Static as well as dynamic analysis was performed to evaluate the relationship of fault removal and fault tolerance techniques • Domain analysis was adopted to generate more powerful test cases

An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering