Optimizing Concurrency Testing via Randomized Algorithms

A Randomized Algorithm for Concurrency Testing Madan Musuvathi Research in Software Engineering Microsoft Research

The Concurrency Testing Problem • A closed program = program + test harness • Test harness encodes both the concurrency scenario and the inputs • The only nondeterminism is the thread interleavings

Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: ??

Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: • Given a certain amount of resources • How close to a proof you can get? • Maximize the number of bugs that you can find • In the limit: Verification == Testing

Testing is more important than Verification • Undecidability argument • There is always going to be programs large enough and properties complex enough for which verification cannot be done • Economic argument • If the cost of a bug is lesser than the cost finding the bug (or proving its absence) • You are better off shipping buggy software • Engineering arugment • Make software only as reliable as the weakest link in the entire system

Providing Probabilistic Guarantees • Problem we would like to solve: • Given a program, prove that it does not do something wrong with probability > 95% • Problem we can hope to solve: • Given a program that contains a bug, design a testing algorithm that finds the bug with probability > 95% • Prove optimality: no testing algorithm can do better

Cuzz: Concurrency Fuzzing • Disciplined randomization of schedules • Probabilistic guarantees • Every run finds a bug with some (reasonably large) probability • Repeat runs to increase the chance of finding a bug • Scalable • In the no. of threads and program size • Effective • Bugs in IE, Firefox, Office Communicator, Outlook, … • Bugs found in the first few runs

Cuzz Demo

Problem Formulation • P is a class of programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an input in constant time • Prove that T finds b in p with a probability X(p,B)

In our case • P is a class of closed terminating concurrent programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an interleaving in constant time • Prove that T finds b in p with a probability X(p,B)

Useful parameters • For a closed terminating concurrent program • (Fancy way of saying, a program combined with a concurrency test) • n : maximum number of threads • k : maximum number of instructions executed

What is a “Bug” – first attempt • Bug is defined as a particular buggy interleaving • No algorithm can find the bug with a probably greater than 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

A Deterministic Algorithm • Provides no guarantees n threads (~ tens) k instructions (~ millions) nk schedules

Randomized Algorithm • Samples the schedule space with some probability distribution • Adversary picks the schedule that is the least probable • Probability of finding the bug <= 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

Randomized Algorithm • 1/nkis a mighty small number • Hard to design algorithms that find the bug with probability == 1/nk n threads (~ tens) k instructions (~ millions) nk schedules

A Good Research Trick • When you cant solve a problem, change the problem definition

Bugs are not adversarial • Usually, if there is one interleaving that finds the bug there are many interleavings that find the same bug • This is not true for program inputs • These set of interleavings that find the bug share the same root cause • The root cause of real bugs are not complicated • Smart people make stupid mistakes

Classifying Bugs • Classify concurrency bugs based on a suitable “depth” metric • Adversary can chose any bug but within a given depth • Testing algorithm provides better guarantees for bugs with a smaller depth • Even if worst-case probability is less than 1/nk • We want real bugs to have small depth • We want to be able design effective sampling algorithms for finding bugs of a particular depth

Our Bug Depth Definition • Bug Depth = number of ordering constraints sufficient to find the bug • Best explained through examples

A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D EF G H I J  A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G  … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….

Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …

Hypothesis • Most concurrency bugs in practice have a very small depth • What has been empirically validated : • There are lots of bugs of small depths in real programs

Defining a Bug • A schedule is a sequence of (dynamic) instructions • S = set of schedules of a closed program • A concurrency bug B is a strict subset of S

Ordering Constraints • A schedule satisfies an ordering constraint (a,b) if instruction a occurs before instruction b in the schedule A BF G HC D E I J Satisfies (H, C) Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

Depth of a Bug • S(c1,c2,…cn) = set of schedules that satisfy the ordering constraints c1,c2,…cn • A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B

A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

What is the Depth of this Bug Parent Child A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; Any buggy interleaving satisfies (D, G) && (E, H) Bug depth <= 2 F: …. G: if(allocated) H: p->f++; I: … J: …

What is the Depth of this Bug Parent Child Any interleaving that satisfies (E,G) is buggy Bug depth == 1 Even though there are buggy interelavings that don’t satisfy (E,G) A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; F: …. G: if(allocated) H: p->f++; I: … J: …

Lets look at the complicated bug void AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );

The bit operations are not atomic void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

The bug void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED ); void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );

Cuzz Guarantee • Given a program that creates at most n threads and executes at most k instructions • Cuzz finds every bug of depth d with probability in every run of the program

A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/n • n: no. of threads (~ tens) Possible schedules A B C D E F G H I J  A BF G HC D E I J  A BF GC D EH I J  A BF GCHD EI J A BF G H I J C D E  … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …

A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Possible schedules A B C D EF G H I J  A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G  … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….

Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …

Cuzz Algorithm Inputs: n: estimated bound on the number of threads k: estimated bound on the number of steps d: target bug depth // 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d; // 2. chose d-1 lowering points at random for i in [1...d) do lowering[i] = rand() % k; steps = 0; while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++; // 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i; }

A Bug of Depth 1 • Found when child has a higher probability than the parent (prob = ½) Parent Pri = 1 Child Pri = 2 fork (child); p = malloc(); fork (child); do_init(); p->f ++; p = malloc();

A Bug of Depth 2 • Found when the parent starts with a higher probability and a lowering point is inserted after the branch condition (prob= 1/2*5 = 1/10) Parent Pri = 3 Child Pri = 2 p = malloc(); fork (child); if (p != NULL) p->f ++; p = malloc(); fork (child); if (p != NULL) p = NULL; Lowering Point Pri = 1 p->f ++;

In Practice, CuzzBeats its Bound • Cuzz performs far greater than the theoretical bound • The worst-case bound is based on a conservative analysis • We employ various optimizations • Programs have LOTS of bugs • Probability of finding any of the bug is (roughly) the sum of the probability of finding each • The buggy code is executed LOTS of times

For Some of our Benchmarks • Probability increases with n, stays the same with k • In contrast, worst-case bound = 1/nkd-1

Dimension Theory • Any partial-order G can be expressed as an intersection of a set of total orders • This set is called a realizer of G a a b c d e = b d a d b e c c e

Property of Realizers • For any unordered pair a and b, a realizer contains two total orders that satisfy (a,b) and (b,a) a a b c d e = b d a d b e c c e

Dimension of a Partial Order • Dimension of Gis the size of the smallest realizer of G • Dimension is 2 for this example a a b c d e = b d a d b e c c e

Why is it called “dimension” • You can encode a partial-order of dimension d as points in a d-dimensional space c a e = b d b c e d a a b c d e a d b e c

Why is it relevant for us • P = Set of all partial orders, B = Set of all bugs of depth 1 • If you can uniformly sample the smallest realizer of a partial order p • Probability of any bug of depth 1 >= 1/dimension(p) a a b c d e = b d a d b e c c e

All this is good, but • Finding the dimension of a partial order in NP complete • Real programs are not static partial-orders

Width of a Partial-Order • Width of a partial-order G is the minimum number of total orders needed to cover G • Width corresponds to the number of “threads” in G • For all G, Dimension(G) <= Width(G) a a b d b d is covered by c e c e

Cuzz Algorithm • Cuzz is an online randomized algorithm for uniformly sampling a realizer of size Width(G) • Assign random priorities to “threads” and topologically sort based on the priorities a a b c d e = b d a d b e c c e

Extension to Larger Depths • Note: a realizer of G covers all possible orderings of an unordered pair • We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs • d-dimension of G is the size of the smallest d-realizer of G • Theorem • d-Dimension(G) <= Dimension(G) . kd-1 • where k is the number of nodes in G • Cuzz is an online algorithm for uniformly sampling over a d-realizer of G

Optimizing Concurrency Testing via Randomized Algorithms

Optimizing Concurrency Testing via Randomized Algorithms

Presentation Transcript

Rand Water

CHESS: Systematic Concurrency Testing

Concurrency Testing Challenges, Algorithms, and Tools

Systematic Concurrency Testing

Ayn Rand

=rand()

AYN RAND

A logic for true concurrency

DMC testing algorithm for suspected Influenza (2011-12)

Ayn Rand

Ayn Rand

Design of a Control Workstation for Controller Algorithm Testing

4. GLM Algorithm Latency Testing

An Optimistic Concurrency Control Algorithm for Mobile Ad-hoc Network Databases

HIV Diagnostics and A New Testing Algorithm

A* algorithm

Rand Feind