560 likes | 688 Vues
This study explores a randomized approach to concurrency testing, addressing the challenges associated with verifying and testing software. It highlights the issues of nondeterminism in thread interleavings, distinguishing between verification—proving a program's correctness—and testing, which aims to identify bugs using available resources. We propose a probabilistic model that seeks to find bugs in concurrent programs with high probability. The paper introduces the concept of bug depth, emphasizing effective sampling algorithms that enhance the likelihood of uncovering concurrency-related issues in software.
E N D
A Randomized Algorithm for Concurrency Testing Madan Musuvathi Research in Software Engineering Microsoft Research
The Concurrency Testing Problem • A closed program = program + test harness • Test harness encodes both the concurrency scenario and the inputs • The only nondeterminism is the thread interleavings
Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: ??
Verification vs Testing • Verification: • Prove that the program is correct (free of bugs) • With the minimum amount of resources • Testing: • Given a certain amount of resources • How close to a proof you can get? • Maximize the number of bugs that you can find • In the limit: Verification == Testing
Testing is more important than Verification • Undecidability argument • There is always going to be programs large enough and properties complex enough for which verification cannot be done • Economic argument • If the cost of a bug is lesser than the cost finding the bug (or proving its absence) • You are better off shipping buggy software • Engineering arugment • Make software only as reliable as the weakest link in the entire system
Providing Probabilistic Guarantees • Problem we would like to solve: • Given a program, prove that it does not do something wrong with probability > 95% • Problem we can hope to solve: • Given a program that contains a bug, design a testing algorithm that finds the bug with probability > 95% • Prove optimality: no testing algorithm can do better
Cuzz: Concurrency Fuzzing • Disciplined randomization of schedules • Probabilistic guarantees • Every run finds a bug with some (reasonably large) probability • Repeat runs to increase the chance of finding a bug • Scalable • In the no. of threads and program size • Effective • Bugs in IE, Firefox, Office Communicator, Outlook, … • Bugs found in the first few runs
Problem Formulation • P is a class of programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an input in constant time • Prove that T finds b in p with a probability X(p,B)
In our case • P is a class of closed terminating concurrent programs • B is a class of bugs • Given P and B, you design a testing algorithm T • Given T, the adversary picks a program p in P containing a bug b in B • Given p, T generates an interleaving in constant time • Prove that T finds b in p with a probability X(p,B)
Useful parameters • For a closed terminating concurrent program • (Fancy way of saying, a program combined with a concurrency test) • n : maximum number of threads • k : maximum number of instructions executed
What is a “Bug” – first attempt • Bug is defined as a particular buggy interleaving • No algorithm can find the bug with a probably greater than 1/nk n threads (~ tens) k instructions (~ millions) nk schedules
A Deterministic Algorithm • Provides no guarantees n threads (~ tens) k instructions (~ millions) nk schedules
Randomized Algorithm • Samples the schedule space with some probability distribution • Adversary picks the schedule that is the least probable • Probability of finding the bug <= 1/nk n threads (~ tens) k instructions (~ millions) nk schedules
Randomized Algorithm • 1/nkis a mighty small number • Hard to design algorithms that find the bug with probability == 1/nk n threads (~ tens) k instructions (~ millions) nk schedules
A Good Research Trick • When you cant solve a problem, change the problem definition
Bugs are not adversarial • Usually, if there is one interleaving that finds the bug there are many interleavings that find the same bug • This is not true for program inputs • These set of interleavings that find the bug share the same root cause • The root cause of real bugs are not complicated • Smart people make stupid mistakes
Classifying Bugs • Classify concurrency bugs based on a suitable “depth” metric • Adversary can chose any bug but within a given depth • Testing algorithm provides better guarantees for bugs with a smaller depth • Even if worst-case probability is less than 1/nk • We want real bugs to have small depth • We want to be able design effective sampling algorithms for finding bugs of a particular depth
Our Bug Depth Definition • Bug Depth = number of ordering constraints sufficient to find the bug • Best explained through examples
A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …
A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D EF G H I J A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….
Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …
Hypothesis • Most concurrency bugs in practice have a very small depth • What has been empirically validated : • There are lots of bugs of small depths in real programs
Defining a Bug • A schedule is a sequence of (dynamic) instructions • S = set of schedules of a closed program • A concurrency bug B is a strict subset of S
Ordering Constraints • A schedule satisfies an ordering constraint (a,b) if instruction a occurs before instruction b in the schedule A BF G HC D E I J Satisfies (H, C) Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …
Depth of a Bug • S(c1,c2,…cn) = set of schedules that satisfy the ordering constraints c1,c2,…cn • A bug B is of depth ≤ d, if there exists constraints c1,c2,…cd such that S(c1,c2,…cd) B
A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …
What is the Depth of this Bug Parent Child A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; Any buggy interleaving satisfies (D, G) && (E, H) Bug depth <= 2 F: …. G: if(allocated) H: p->f++; I: … J: …
What is the Depth of this Bug Parent Child Any interleaving that satisfies (E,G) is buggy Bug depth == 1 Even though there are buggy interelavings that don’t satisfy (E,G) A: … B: p = malloc(); C: fork (child); D: allocated = 1 E: p = null; F: …. G: if(allocated) H: p->f++; I: … J: …
Lets look at the complicated bug void AddToCache() { // ... A: x &= ~(FLAG_NOT_DELETED); B: x |= FLAG_CACHED; MemoryBarrier(); // ... } AddToCache(); assert( x & FLAG_CACHED );
The bit operations are not atomic void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );
The bug void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED ); void AddToCache() { A1: t = x & ~(FLAG_NOT_DELETED); A2: x = t B1: u = x | FLAG_CACHED; B2: x = u; } AddToCache(); assert( x & FLAG_CACHED );
Cuzz Guarantee • Given a program that creates at most n threads and executes at most k instructions • Cuzz finds every bug of depth d with probability in every run of the program
A Bug of Depth 1 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/n • n: no. of threads (~ tens) Possible schedules A B C D E F G H I J A BF G HC D E I J A BF GC D EH I J A BF GCHD EI J A BF G H I J C D E … Parent Child A: … B: fork (child); C: p = malloc(); D: … E: … F: …. G: do_init(); H: p->f ++; I: … J: …
A Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Possible schedules A B C D EF G H I J A B CD EH I J F G A B C H I D E G J A B C D HEFI J G A B CH D EI J F G … Parent Child A: … B: p = malloc(); C: fork (child); D: …. E: if (p != NULL) F: p->f ++; G: H: … I: p = NULL; J : ….
Another Bug of Depth 2 • Bug Depth = no. of ordering constraints sufficient to find the bug • Probability of bug >= 1/nk • n: no. of threads (~ tens) • k: no. of instructions (~ millions) Parent Child A: … B: Lock (A); C: … D: Lock (B); E: … F: … G: Lock (B); H: … I: Lock (A); J: …
Cuzz Algorithm Inputs: n: estimated bound on the number of threads k: estimated bound on the number of steps d: target bug depth // 1. assign random priorities >= d to threads for t in [1…n] do priority[t] = rand() + d; // 2. chose d-1 lowering points at random for i in [1...d) do lowering[i] = rand() % k; steps = 0; while (some thread enabled) { // 3. Honor thread priorities Let t be the highest-priority enabled thread; schedule t for one step; steps ++; // 4. At the ith lowering point, set the priority to i if steps == lowering[i] for some i priority[t] = i; }
A Bug of Depth 1 • Found when child has a higher probability than the parent (prob = ½) Parent Pri = 1 Child Pri = 2 fork (child); p = malloc(); fork (child); do_init(); p->f ++; p = malloc();
A Bug of Depth 2 • Found when the parent starts with a higher probability and a lowering point is inserted after the branch condition (prob= 1/2*5 = 1/10) Parent Pri = 3 Child Pri = 2 p = malloc(); fork (child); if (p != NULL) p->f ++; p = malloc(); fork (child); if (p != NULL) p = NULL; Lowering Point Pri = 1 p->f ++;
In Practice, CuzzBeats its Bound • Cuzz performs far greater than the theoretical bound • The worst-case bound is based on a conservative analysis • We employ various optimizations • Programs have LOTS of bugs • Probability of finding any of the bug is (roughly) the sum of the probability of finding each • The buggy code is executed LOTS of times
For Some of our Benchmarks • Probability increases with n, stays the same with k • In contrast, worst-case bound = 1/nkd-1
Dimension Theory • Any partial-order G can be expressed as an intersection of a set of total orders • This set is called a realizer of G a a b c d e = b d a d b e c c e
Property of Realizers • For any unordered pair a and b, a realizer contains two total orders that satisfy (a,b) and (b,a) a a b c d e = b d a d b e c c e
Dimension of a Partial Order • Dimension of Gis the size of the smallest realizer of G • Dimension is 2 for this example a a b c d e = b d a d b e c c e
Why is it called “dimension” • You can encode a partial-order of dimension d as points in a d-dimensional space c a e = b d b c e d a a b c d e a d b e c
Why is it relevant for us • P = Set of all partial orders, B = Set of all bugs of depth 1 • If you can uniformly sample the smallest realizer of a partial order p • Probability of any bug of depth 1 >= 1/dimension(p) a a b c d e = b d a d b e c c e
All this is good, but • Finding the dimension of a partial order in NP complete • Real programs are not static partial-orders
Width of a Partial-Order • Width of a partial-order G is the minimum number of total orders needed to cover G • Width corresponds to the number of “threads” in G • For all G, Dimension(G) <= Width(G) a a b d b d is covered by c e c e
Cuzz Algorithm • Cuzz is an online randomized algorithm for uniformly sampling a realizer of size Width(G) • Assign random priorities to “threads” and topologically sort based on the priorities a a b c d e = b d a d b e c c e
Extension to Larger Depths • Note: a realizer of G covers all possible orderings of an unordered pair • We define a d-realizer of G as a set of total orders that covers all possible orderings of d unordered pairs • d-dimension of G is the size of the smallest d-realizer of G • Theorem • d-Dimension(G) <= Dimension(G) . kd-1 • where k is the number of nodes in G • Cuzz is an online algorithm for uniformly sampling over a d-realizer of G