230 likes | 342 Vues
#GHC13. Looking for Bugs in all the Right Places. Elaine Weyuker October 3, 2013. 2013. Goal. To determine which files of a large software system are likely to contain the largest numbers of bugs in the future. Why is this Important?. Help testers prioritize testing efforts.
 
                
                E N D
#GHC13 Looking for Bugs in all the Right Places Elaine Weyuker October 3, 2013 2013
Goal • To determine which files of a large software system are likely to contain the largest numbers of bugs in the future.
Why is this Important? • Help testers prioritize testing efforts. • Help developers decide when to do design and code reviews and what to re-implement. • Help managers allocate resources.
Approach Verified that bugs were non-uniformly distributed among files. Identified properties that were likely to affect fault-proneness, and then built a statistical model and ultimately a tool to make predictions.
Information Needed for Predictions • Size of file (KLOCs) • Number of changes to the file in the previous 2 releases. • Number of bugs in the file in the last release. • Age of file (Number of releases the file has been in the system) • Language the file is written in.
Data Source • All of the systems we’ve studied to date use a configuration management system which integrates version control and change management functionality, including bug history. • Data is automatically extracted from the associated data repository and passed to the prediction engine.
Making Predictions • Used Negative Binomial Regression • Also considered machine learning algorithms including: • Recursive Partitioning • Random Forests • BART (Bayesian Additive Regression Trees)
Prediction Tool • Consists of two parts. • The back end extracts data needed to make the predictions. • The front end makes the predictions and displays them.
Tool Functionality • Extracts necessary data from the repository. • Predicts how many bugs will be in each file in the next release of the system. • Sorts the files in decreasing order of the number of predicted bugs. • Displays results to the user.
Assessing Success • Percentage of actual bugs that occurred in the N% of the files predicted to have the largest number of bugs. (N=20) • Considered other measures less sensitive to the specific value of N.
Release to be predicted User-supplied parameters Fault-proneness predictions Statistical Analysis Fault Prediction Tool Overview Prediction Engine Version Mgmt /Fault Database (previous releases)
User specifies that all problems reported in System Test phase are faults. User enters system name. Available releases are found in the version mgmt database. User chooses the releases to analyze. User selects 4 file types. User asks for fault predictions for release “Bluestone2008.1”
User confirms configuration User enters filename to save the configuration. User clicks Save & Run button, to start the prediction process.
Initial prediction view for Bluestone2008.1 All files are listed in decreasing order of predicted faults
Current Status • Prediction tool is fully-operational • 750 lines Python • 2150 lines C, 75K bytes compiled • Current version’s backend is specific for the internal AT&T configuration management system but can be adapted to other configuration management systems. All that is needed is a source of the data required by the prediction model.
Other Factors We’ve Studied • Developers • Counts – How many people worked on the code in the most recent release or all previous releases. • Individuals – Who worked on the code? • Calling Structure • How many calls from/to a file. • Are the calling/called files (new, changed, faulty) • Amount of Code Changed • How many lines added, deleted, changed
What’s Ahead? • Research • How well can we make predictions using attributes available for software systems using other bug reporting systems? • What are the most accurate models that can be built from those attributes? • What are the best ways to take advantage of fault predictions? • Can predictions be made for units smaller than files? • Can run-time attributes be used to make fault predictions? (execution time, execution frequency, memory use, …) • What is the most meaningful way to assess the effectiveness and accuracy of the predictions? • Engineering • Build prediction models for different configuration management systems and bug databases. • Design and build a better user interface. • Integrate prediction tool into development and testing environments.
Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org