Comparison of Blackbox and Whitebox Fuzzers in Bug Detection

Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB) 2008 Team for Research in Ubiquitous Secure Technology

Overview • Introduction to Fuzz testing • Our research • Result "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

What Is Fuzzing? • A method of finding software holes by feeding purposely invalid data as input to a program. – B. Miller et al.; inspired by line noise • Apps: image processors, media players, OS • Fuzz testing is generally automated • Finds many problems related to reliability; many of which are potential security holes. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Types of Fuzz Testing • BlackBox: Randomly generated data is fed to a program as input to see if it crashes. • Does not require knowledge of the program source code/ deep code inspection. • A quick way of finding defects without knowing details of the application. • WhiteBox:Creates test cases considering the target program's logical constraints and data structure. • Requires knowledge of the system and how it uses the data. • Deeper penetration into the program. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Zzuf - Blackbox Fuzzer • Finds bugs in applications by corrupting random bits in user-contributed data. • To make new test cases, Zzuf uses a range of seeds and fuzzing ratios (corruption ratio). "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Catchconv - WhiteBox Fuzzer • To create test cases, CC starts with a valid input, observes the program execution on this input, collects the path condition followed by the program on that sample, and attempts to infer related path conditions that lead to an error, then uses this as the starting point for bug-finding. • CC has has some downtime when it only traces a program and is not generating new fuzzed files. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Valgrind • A tool for detecting memory management errors. • Reports the line number in the code where the program error occurred. • Helped us find and report more errors than we would if we focused solely on segmentation faults. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Types of errors reported by Valgrind By tracking a program’s execution of a file, Valgrind determines the types of errors that occur which may include: • Invalid writes • Double free - Result 256 • Invalid reads • Double free • Uninitialized values • Syscal Pram • Memory leak "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Program run under Valgrind "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Methodology "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

All of the test files that triggered bugs were uploaded on Metafuzz.com. The webpage contained: Link to the test file Bug type Program that the bug was found in Stack hash number where the bug was located Metafuzz "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Metafuzz webpage "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Target applications • Mplayer, Antiword, ImageMagick Convert and Adobe Flash Player • MPlayer the promary target: • OS software • Preinstalled on many Linux distributions • Updates available via subversion • Convenient to file a bug report • Developer would get back to us! • Adobe bug reporting protocol requires a certain bug to receive a number of votes form users before it will be looked at by Flash developers. • VLC requires building subversions from nightly shots. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

In 6 weeks, generated more than 1.2 million test cases. We used UC Berkeley PSI-cluster of computers, which consists of 81 machines (270 processors). Zzuf, MPlayer, and CC were installed on them. Created a de-duplication script to find the unique bugs. Reported 89 unique bugs; developers have already eliminated 15 of them. Research Highlights "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

To provide assessments for the two fuzzers, we gathered several metrics: Number of test cases generated Number of unique test cases generated Total bugs and total unique bugs found by each fuzzer. Result "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Result con’t • Generated 1.2 million test cases • 962,402 by Zzuf. • 279,953 by Catchconv. • From the test cases: • Zzuf found 1,066,000 errors. • Catchconv reported 304,936. • Unique (nonduplicate) errors found: • 456 by Zzuf • 157 by Cachconv "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Result con’t • Zzuf reports a disproportionately larger amount of errors than CC. Is Zzuf better than CC? • No! The two fuzzers generated different numbers of test cases. • How could we have a fair comparison of the fuzzers’ efficiency? • Need to gauge the amount of duplicate work performed by each fuzzer. • Find how many of these test cases were unique. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Average Unique Errors per 100 Unique Test Cases • First, we compared performance of the fuzzers by the average number of unique bugs found per 100 test cases. • Zzuf: 2.69 • CC : 2.63 • Zzuf’s apparent superiority diminishes. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Unique Errors as % of Total Errors • Next, we analyzed fuzzers’ performance basedon the percentage of unique errors found out of the total errors. • Zzuf: .05% • CC: .22% • Less than a quarter percent difference between the fuzzers. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Types of Errors (as % of Total Errors) • Also considered analyzing the fuzzer based on bug types found by the fuzzers. • Zzuf performed better in finding “invalid write”, which is a more important security bug type. • Not an accurate comparison, since we couldn’t tell which bug specifically caused a crash. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Conclusion • We were not able to make a solid conclusion about the superiority of either fuzzer based on the metric we gathered. • Knowing which fuzzer is able to find serious errors more quickly would allow us to make a more informed conclusion about their comparative efficiencies. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Conclusion con’t • Need to record the amount of CPU clock cycles required to execute test cases and find errors. • Unfortunately we did not record this data during our research, we are unable to make such a comparison between the fuzzers. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Guides for Future Research To perform a precise comparison of Zzuf and CC: • The difference between the number of test cases generated by Zzuf and CC for a given seed file and specific time frame should be recorded. • Measure CPU time to compare the number of unique test cases generated by each fuzzer for a given time. • Need a new method to identify unique errors avoid reporting duplicate bugs: • Need to use automatically generate a unique hash for each reported error that can then be used to identify duplicate errors. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Guides for Future Research con’t 4. Use a more robust data collection infrastructure that could accommodate the massive amount of data colected. • Our ISP shut Metafuzz down due to excess server load. • Berkeley storage full. 5.Include an internal issue tracker that keeps track of whether or not a bug has been reported, to avoid reporting duplicate bugs. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

WhiteBox or BlackBox?? • With lower budget/ less time: use Blackbox • Once low-hanging bugs are gone, fuzzing must become smarter: use whitebox • In practice, use both. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Acknowledgment • National Science Foundation (NSF) for funding this project through the SUPERB-TRUST (Summer Undergraduate Program in Engineering Research at Berkeley - Team for Research in Ubiquitous Secure Technology) program • Kristen Gates (Executive Director for Education for the TRUST Program) • Faculty advisor David Wagner • Graduate mentors Li-Wen Hsu, David Molner, Edwardo Segura, Alex Fabrikant, and Alvaro Cardenas. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani

Questions? Thank you "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani Thank you! Thank you! Questions? Questions?

Comparison of Blackbox and Whitebox Fuzzers in Bug Detection

Comparison of Blackbox and Whitebox Fuzzers in Bug Detection

Presentation Transcript

Software Bugs In The Context of Quality Assurance

Bugs and More Bugs

Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs

Finding and Fixing Bugs in Software

Software comparison

Software Comparison

Finding even more bugs with FindBugs

The Life and Times of Software Bugs

CSE403 Software Engineering Autumn 2001 Finding the Bugs

Bugs – From Finding to Preventing

Finding and fixing bugs

Software Bugs Bite!

Comparison of ERP Software

Dual Analysis for Proving Safety and Finding Bugs

Finding Bugs in Dynamic Web Applications

Finding Bugs in Dynamic Web Applications

Finding Bugs in Dynamic Web Applications

Blackbox Testing

BlackBox Methodology

Software Bugs Bite!