1 / 95

On-the-Fly Data-Race Detection in Multithreaded Programs

On-the-Fly Data-Race Detection in Multithreaded Programs. Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster. Table of Contents. What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected?

aysha
Télécharger la présentation

On-the-Fly Data-Race Detection in Multithreaded Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-the-Fly Data-RaceDetection inMultithreaded Programs Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster

  2. Table of Contents • What is a Data-Race? • Why Data-Races are Undesired? • How Data-Races Can be Prevented? • Can Data-Races be Easily Detected? • Feasible and Apparent Data-Races • Complexity of Data-Race Detection • NP and Co-NP • Program Execution Model & Ordering Relations • Complexity of Computing Ordering Relations • Proof of NP/Co-NP Hardness

  3. Table of ContentsCont. • So How Data-Races Can be Detected? • Lamport’s Happens-Before Approximation • Approaches to Detection of Apparent Data-Races: • Static Methods • Dynamic Methods: • Post-Mortem Methods • On-The-Fly Methods

  4. Table of ContentsCont. • Closer Look at Dynamic Methods: • DJIT+ • Local Time Frames • Vector Time Frames • Logging Mechanism • Data-Race Detection Using Vector Time Frames • Which Accesses to Check? • Which Time Frames to Check? • Access History & Algorithm • Coherency • Results

  5. Table of ContentsCont. • Lockset • Locking Discipline • The Basic Algorithm & Explanation • Which Accesses to Check? • Improving Locking Discipline • Initialization • Read-Sharing • Barriers • False Alarms • Results • Combining DJIT+ and Lockset • Summary • References

  6. What is a Data Race? • Concurrent accesses to a shared location by two or more threads, where at least one is for writing Example (variable X is global and shared): Thread 1Thread 2 X=1 T=Y Z=2 T=X Usually indicative of bug!

  7. Why Data-Races areUndesired? • Programs with data-races: • Usually demonstrate unexpected and even non-deterministic behavior. • The outcome might depend on specific execution order (A.K.A threads’ interleaving). • Re-executing may not always produce the same results/same data-races. • Thus, hard to debug and hard to write correct programs.

  8. Machine code for ‘X++’ Why Data Races areUndesired? – Example • First interleaving: Thread 1Thread 2 1. reg1X 2. incr reg1 3. Xreg1 4. reg2X 5. incr reg2 6. Xreg2 Second interleaving: Thread 1Thread 2 1. reg1X 2. incr reg1 3. reg2X 4. incr reg2 5. Xreg2 6. Xreg1 At the beginning: X=0. At the end: X=1 or X=2? Depends on the scheduling order

  9. T1 T2 Time Execution Order • Each thread has a different execution speed. • The speed may change over time. • For an external observer of the time axis, instructions appear in execution order. • Any order is legal. • Execution order for a single thread is called program order.

  10. How Data Races Can be Prevented? • Explicit synchronization between threads: • Locks • Critical Sections • Barriers • Mutexes • Semaphores • Monitors • Events • Etc. Lock(m) Unlock(m)Lock(m) Unlock(m) Thread 1Thread 2 X++ T=X

  11. Synchronization –“Bad” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { balance+=amount; if (balance<amount); } print( “Error” ); else balance–=amount; } • ‘Deposit’ and ‘Withdraw’ are not “atomic”!!! • What is the final balance after a series of concurrent deposits and withdraws?

  12. Synchronization –“Good” Bank Account Example Thread 1Thread 2 Deposit( amount ) { Withdraw( amount ) { Lock( m );Lock( m ); balance+=amount; if (balance<amount) Unlock( m ); print( “Error” ); } else balance–=amount; Unlock( m ); } • Since critical sections can never execute concurrently, this version exhibits no data-races. Critical Sections

  13. Is This Enough? • Theoretically – YES • Practically – NO • What if programmer accidentally forgets to place correct synchronization? • How all such data race bugs can be detected in large program? • How to eliminate redundant synchronization?

  14. Can Data Races be Easily Detected? – No! • The problem of deciding whether a given program contains potential data races (called feasible) is NP-hard [Netzer&Miller 1990] • Input size = # instructions performed • Even for 2 threads only • Even with no loops/recursion • Lots of execution orders: (#threads)thread_length*threads • Also all possible inputs should be tested • Side effects of the detection code can eliminate all data races a lock(m) ... unlock(m) lock(m) b unlock(m)

  15. Feasible Data-Races • Based on the possiblebehavior of the program (i.e. semantics of the program’s computation). • The actual (!) data-races that can possibly happen in some program execution. • Require full analyzing of the program’s semantics to determine if the execution could have allowed accesses to same shared variable to execute concurrently.

  16. Apparent Data Races • Approximations of the feasible data races • Based on only the behavior of program explicitsynchronization (and not on program semantics) • Important since data-races are usually result of improper synchronization • Easier to locate • Less accurate • Exist iff at least one feasible data race exists  • Exhaustively locating all apparent data races is still NP-hard (and, in fact, undecidable) 

  17. Initially: grades = oldDatabase; updated = false; Thread T.A. grades:=newDatabase; updated:=true; Thread Lecturer while (updated == false); X:=grades.gradeOf(lecturersSon); Apparent Data-Races Cont. • Accesses a and b to same shared variable in some execution, are ordered, if there is a chain of corresponding explicit synchronization events between them. • a and b are said to have potentially executedconcurrently if no explicit synchronization prevented them from doing so.

  18. Feasible vs. Apparent Thread 1 [Ffalse]Thread 2 X++ F=true if (F==true) X– – • Apparent data-races in the execution above – 1 & 2. • Feasible data-races – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose ‘F’ is false at start). • Protecting ‘F’ only will protect ‘X’ as well. 1 2

  19. Feasible vs. Apparent Thread 1 [Ffalse]Thread 2 X++ Lock( m ) Lock( m ) T = F F=true Unlock( m ) Unlock( m ) if (T==true) X– – • No feasible or apparent data-races exist under any execution order!!! • ‘F’ is protected by a lock. ‘X++’ and ‘X– –’ are always ordered and properly synchronized. • Rather there is a sync‘ chain of Unlock(m)-Lock(m) between ‘X++’ and ‘X– –’, or only ‘X++’ executes.

  20. Complexity ofData-Race Detection • Exactly locating the feasible data-races is an NP-hard problem. • The apparent races, which are simpler to locate, must be detected for debugging. Apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution. The problem of exhaustively locating all apparent data-races is still NP-hard.

  21. Reminder: NP and Co-NP • There is a set of NP problems for which: • There is no polynomial solution. • There is an exponential solution. • Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem. • Problem is NP-complete, if it is NP-hard and it resides in NP. • Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).

  22. Reminder: NP and Co-NP Cont. • The set of Co-NP problems is complementary to the set of NP problems. • Problem is Co-NP-hard if we can only answer ‘no’. • If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution). • The problem of checking whether a boolean formula is satisfiable is NP-complete. • Answer ‘yes’ if satisfiable assignment for variables was found. • Same, but not-satisfiable – Co-NP-complete.

  23. Why Data-Race Detectionis NP-Hard? • Question: How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent? • Answer: We must check all execution orders of P and see. • If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop. • Otherwise we should continue checking.

  24. Program Execution Model • Consider a class of multi-threaded programs that synchronize by counting semaphores. • Program execution is described by collection of events and two relations over the events. • Synchronization event – instance of some synchronization operation (e.g. signal, wait). • Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).

  25. Program Execution Model –Events’ Relations • Temporal orderingrelation – aT→ b means that a completes before b begins (i.e. last action of a can affect first action of b). • Shared data dependence relation - aD→b means that a accessesa shared variable that b later accesses and at least one of the accesses is a modification to variable. • Indicates when one event causally affects another.

  26. Program Execution Model –Program Execution • Program executionP – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms: • A1: T→ is an irreflexive partial order (a T↛ a). • A2: If a T→b T↮ c T→ d then a T→ d. • A3: If a D→ b then b T↛ a. • Notes: • ↛ is a shorthand for ¬(a→b). • ↮ is a shorthand for ¬(a→b)⋀¬(b→a). • Notice that A1 and A2 imply transitivity of T→.

  27. Program Execution Model –Feasible Program Execution • Feasible program execution for P – execution of a program that: • performs exactly the same events as P • May exhibit different temporal ordering. • Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if • F1: E’=E (i.e. exactly the same events), and • F2: P’ satisfies the axioms A1 - A3 of the model, and • F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies) • Note: Any execution with same shared-data dependencies as P will execute exactly the same events as P.

  28. Program Execution Model –Ordering Relations • Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations are defined: • Summarize the temporal orderings present in the feasible program executions.

  29. Program Execution Model –Ordering Relations - Explanation • The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P). • The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P). • The happened-before relations show events that execute in a specific order. • The concurrent-with relations show events that execute concurrently. • The ordered-with relations show events that execute in either order but not concurrently.

  30. Complexity of Computing Ordering Relations • The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard. • The problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard. • Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.

  31. Proof of Theorem 1 –Notes • The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction. • The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete. • The presented proof is only for the must-have-happened-before (MHB) relation. • Proofs for the other relations are analogous. • The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).

  32. Proof of Theorem 1 –3CNFSAT • An instance of 3CNFSAT is given by: • A set of n variables, V={X1,X2, …,Xn}. • A boolean formula B consisting of conjunction of m clauses, B=C1⋀C2⋀…⋀Cm. • Each clause Cj=(L1⋁L2⋁L3) is a disjunction of three literals. • Each literal Lk is any variable from V or its negation - Lk=Xi or Lk=⌐Xi. • Example: B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)

  33. Proof of Theorem 1 –Idea of the Proof • Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0). • The execution of this program simulates a nondeterministic evaluation of B. • Semaphores are used to represent the truth values of each variable and clause. • The execution exhibits certain orderings iff B is not satisfiable.

  34. wait( Ai ) signal( Xi ) . . signal( Xi ) wait( Ai ) signal( not-Xi ) . . signal( not-Xi ) signal( Ai ) wait( Pass2 ) signal( Ai ) Proof of Theorem 1 –The Construction per Variable • For each variable, Xi, the following three threads are constructed: • “. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.

  35. Proof of Theorem 1 –The Construction per Variable • The semaphores Xi and not-Xi are used to represent the truth value of variable Xi. • Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi. • The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(Ai) operations in two leftmost threads).

  36. wait( L1 ) signal( Cj ) wait( L2 ) signal( Cj ) wait( L3 ) signal( Cj ) Proof of Theorem 1 –The Construction per Clause • For each clause, Cj, the following three threads are constructed: • L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi). • The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.

  37. Proof of Theorem 1 –Explanation of Construction • The first 3n threads operate in two phases: • The first pass is a non-deterministic guessing phase in which: • Each variable used in the boolean formula B is assigned a unique truth value. • Only one of the Xi and not-Xi semaphores is signaled. • The second pass (begins after semaphore Pass2 is signaled) is used to ensure that the program doesn’t deadlock: • The semaphore operations that were not allowed to execute during the first pass are allowed to proceed.

  38. wait( C1 ) . . wait( Cm ) b: skip a: skip signal( Pass2 ) . . signal( Pass2 ) m n Proof of Theorem 1 –The Final Construction • Additional two threads are created: • There are n ‘signal(Pass2)’ operations – one for each variable. • There are m ‘wait(Cj)’ operations – one for each clause.

  39. Proof of Theorem 1 –Putting All Together • Event bis reached only after semaphore Cj,for each clause j, has been signaled. • The program contains no conditional statements or shared variables. • Every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none). • Claim: For any execution a MHB→ b iff B is not satisfiable.

  40. Proof of Theorem 1 –Proving the “if” Part • Assume that B is not satisfiable. • Then there is always some clause, Cj, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass. • Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass. • The second pass doesn’t occur until after event a executes, so event a must precede event b. • Therefore, a MHB→ b.

  41. Proof of Theorem 1 –Proving the “only if” Part • Assume that a MHB→ b. • This means that there is no execution in which b either precedes a or executes concurrently with a. • Assume by way of contradiction that B is satisfiable. • Then some truth assignment can be guessed during the first pass that satisfies all of the clauses. • Event b can then execute before event a, contradicting the assumption. • Therefore, B is not satisfiable.

  42. Complexity of Computing Ordering Relations – Cont. • Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard. • By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard. • Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard. • Proof by similar reductions …

  43. Complexity of Race Detection -Conditions, Loops and Input • The presented model is too simplistic. • What if the “if” and “while” statements are used? What if the user’s input is allowed? If Y≥0 there is a data-race. Otherwise it is not possible, since [1] is never reached.

  44. Complexity of Race Detection -“NP-Harder”? • The proof above does not use conditional statements, loops or input from outside. • The problem of data-race detection is much-much harder then deciding an NP-complete problem. • Intuitively - there is no exponential solution, since it’s not known whether the program will stop. • Thus, in general case, it’s undecidable.

  45. So How Data-Races Can be Detected? – Approximations • Deciding whether a CHB→ b or a CCW↔ b will reveal feasible data-races. • Since it is intractable problem, the temporal ordering relation T→ should be approximated and apparent data-races located instead. • Recall that apparent data-races exist if and only if at least one feasible race exists. • Yet, it remains a hard problem to locate all apparent data-races.

  46. Approximation Example – Lamport’s Happens-Before • The happens-before partial order, denoted ahb→b, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows: • Shared accesses a and b are concurrent, ahb↮ b, if neither ahb→ b nor bhb→ a holds. • Program Order: a and b are events performed by the same thread, with a preceding b • Release and Acquire: a is a release of a some sync’ object S and b is a corresponding acquire • Transitivity: ahb→c and c hb→b ahb→b

  47. Approaches to Detection ofApparent Data-Races – Static There are two main approaches to detection of apparent data-races (sometimes a combination of the both is used): • Static – perform a compile-time analysis of the code. – Too conservative: • Can’t know or understand the semantics of the program. • Result in excessive false alarms that hide the real data-races. + Test the program globally: • See the whole code of the tested program • Can warn about all possible errors in all possible executions.

  48. Approaches to Detection ofApparent Data-Races – Dynamic • Dynamic – use tracing mechanism to detect whether a particular execution actually exhibited data-races. + Detect only those apparent data-races that actually occur during a feasible execution. – Test the program locally: • Consider only one specific execution path of the program each time. • Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found. • On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.

  49. Approaches to Detection ofApparent Data-Races • No “silver bullet” exists. • The accuracy is of great importance (especially in large programs). • There is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms). • The space and time overheads imposed by the techniques are significant as well.

  50. Closer Look atDynamic Methods • We show two dynamic methods for on-the-fly detection of apparent data-races in multi-threaded programs with locks and barriers: • DJIT+ – based on Lamport’s happens-beforepartial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and MultiRace systems. • Lockset – based on locking discipline and locksetrefinement. Implemented in Eraser tool and MultiRace system.

More Related