A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs

A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs Sebastian Burckhardt Microsoft Research Madanlal Musuvathi Microsoft Research Pravesh Kothari Indian Institute of Technology, Kanpur Santosh Nagarakatte University of Pennsylvania

What is Concurrency Testing? • Whether a test finds a bug depends on • the configuration • the inputs • the schedule • Concurrency bugs are bugs that surface only for some schedules • The Concurrency Testing Problem • How to cover buggy schedules as best we can? • Testing all schedules is infeasible!

Idea: Randomize the Schedule Child Parent void* p = 0; RandDelay(); CreateThd(child); RandDelay(); p = malloc(…); void* p = 0; RandDelay(); CreateThd(child); RandDelay(); p = malloc(…); void* p = 0; RandDelay(); Start(child); void* p = 0; CreateThd(child); p = malloc(…); Instrument code with calls to insert random delays If we are lucky, delay exposes bugs But: how long to delay? where not to delay? Init(); RandDelay(); DoMoreWork(); RandDelay(); p->f ++; Init(); DoMoreWork(); p->f ++; Init(); RandDelay(); DoMoreWork(); RandDelay(); p->f ++; Init(); RandDelay(); DoMoreWork(); RandDelay(); p = malloc(…); RandDelay(); p->f ++;

What is a Randomized Algorithm? • A randomized algorithm: • “An algorithm that makes nondeterministic choices” • An algorithm using a random source with a precisely defined distribution • A probabilistic guarantee: • “A guarantee that doesn’t always hold” • A lower bound on the probability of success

What we did / Talk Outline • Define bug depth in such a way that common bugs have low depth • Develop PCT algorithm (probabilistic concurrency testing), a randomized scheduling algorithmwith a good probabilistic guarantee to find bugs of low depth • Build it into Cuzz, a concurrency fuzzing tool that improves the efficiency of stress testing

Part I Bug depth

Bug Depth Bug Depth = the number of ordering constraints a schedule has to satisfy to find the bug. More constraints means more things have to go “just right” to find the bug. Conjecture: many typical bugs have low depth.Let’s look at 3 examples.

Ordering Violation Example: A Bug of Depth 1 Parent Thread Child Thread … start(child); p = malloc(); … … do_init(); p->f ++; … Bug depth = the number of ordering constraintssufficient to find the bug. All schedules that satisfy the “” find the bug.

Atomicity Violation Example: A Bug of Depth 2 Parent Thread Child Thread p = malloc(); start(child); … If (p != null) p->f++ … … p = null; … Bug depth = the number of ordering constraints sufficient to find the bug. All schedules that satisfy both “” find the bug.

Deadlock Example: A Bug of Depth 2 Parent Thread Child Thread … Lock(A); … Lock(B); … … Lock(B); … Lock(A); … Bug depth = the number of ordering constraints sufficient to find the bug. All schedules that satisfy both “” find the bug.

Part II the PCT ALGORITHM

PCT Algorithm: Randomly Assign & Change Thread Priorities Input: int k; // no. of steps - guessed from previous runs int d; // target bug depth - randomly chosen State: intpri[]; // thread priorities int change[]; // when to change priorities intstepCnt; // current step count PCT::Init() { stepCnt = 0; foreachtid pri[tid] = rand() + d; for( i=0; i<d-1; i++ ) change[i] = rand() % k; } PCT::RandDelay( tid ) { stepCnt ++; if stepCnt == change[i] for some i pri[tid] = i; if (tid is not highest pri enabled thread) spin; }

The PCT Guarantee • Given a program with • n threads (~tens) • k steps (~millions) • a bug of depth d (1,2) • Each run PCT finds the bug with a probability of at least (this is a worst-case guarantee)

Part III the cuzzTool& Results

How it Works • Intercept at synchronization points • Detour win32 synchronization calls • Optionally instrument data accesses • No manual instrumentation required Program binary instrumentation for data accesses (optional) Cuzz Randomized Algorithm Win32 API Kernel Scheduler

Some Results

Practice Beats Worst-Case • Measured Probability often significantly better than worst-case guaranteed probability

Why Does Practice Beat Worst-Case? • Worst-case guarantee applies to hardest-to-find bug of given depth • If bugs can be found in multiple ways, probabilities add up! • Example: Increasing the number of threads helps:

Internal Tool Status • TheCuzz tool is available internally at Microsoft • We are working with several product groups that actively use Cuzzto improve their stress testing

DEmo

Demo Conclusion • Measure probabilities on cluster • Without Cuzz: 1 Fail in 238’820 runs ratio = 0.000004817 • With Cuzz: 12 Fails in 320 runs ratio = 0.0375 • Resource Savings: factor 7,800 1 day of stress testing = 11 seconds of Cuzz testing

Conclusions • Bug depth is a useful metric to focus testing efforts • Systematic randomization improves concurrency testing • No reason not to use Cuzz Thank You For Your Attention.

A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs

A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs

Presentation Transcript

Progress with Progress Guarantees

Probabilistic Analysis and Randomized Algorithm

Finding and Fixing Bugs in Software

Finding bugs with system-specific static analysis

Finding even more bugs with FindBugs

Bugs – From Finding to Preventing

Top- K Query Evaluation with Probabilistic Guarantees

Finding and fixing bugs

Winning BIG With Guarantees

Design and Simulation of an Efficient Real-Time Traffic Scheduler with Jitter and Delay Guarantees

Randomized Algorithms and Motif Finding

Randomized Algorithms and Motif Finding

Randomized Algorithms and Motif Finding

Probabilistic (Average-Case) Analysis and Randomized Algorithms

Finding Bugs in Dynamic Web Applications

Finding Bugs in Dynamic Web Applications

Finding bugs with system-specific static analysis

Bugs with JDK1.2beta

Chapter 5. Probabilistic Analysis and Randomized Algorithms

Finding Bugs in Dynamic Web Applications

Finding Bugs with PC-lint A Static Analysis Tool for C/C++

Finding Bugs with DevPartner Studio Error Detection