CS 265: Dynamic Data Race Detection

CS 265: Dynamic Data Race Detection Koushik Sen UC Berkeley

Race Conditions class Ref { int i; void inc() { int t = i + 1; i = t; } } Courtesy Cormac Flanagan

Race Conditions class Ref { int i; void inc() { int t = i + 1; i = t; } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2; • A race condition occurs if • two threads access a shared variable at the same time without synchronization • at least one of those accesses is a write

Race Conditions t1 class Ref { int i; void inc() { int t = i + 1; i = t; } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2; t2 RD(i) RD(i) WR(i) WR(i)

Data-race int x =0, y = 0; Thread 1 Thread 2 y = 1; x = 1; r1 = x; r2 = y; • Is r1=r2=0 possible after the execution?

Data-race int x =0, y = 0; Thread 1 Thread 2 y = 1; x = 1; r1 = x; r2 = y; • Is r1=r2=0 possible after the execution? • Possible in most programming languages • Reordering of events by compiler • Memory model

Data race free -> Seq consistency • Memory models for programming languages • Specifies what exactly the compiler will ensure • Consensus: Data-race-free pgms -> seq consistent • Java memory model [Manson et al.,POPL05] • A complex model for programs with races • Bugs/unclear implementations • C++ (new version) [Boehm,Adve, PLDI08] • No semantics for programs with races!

Lock-Based Synchronization class Ref { int i; // guarded by this void inc() { synchronized (this) { int t = i + 1; i = t; } } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2; • Field guarded by a lock • Lock acquired before accessing field • Ensures race freedom

Dynamic Race Detection • Happens Before [Dinning and Schonberg 1991] • Lockset: • Eraser [Savage et al. 1997] • Precise Lockset [Choi et al. 2002] • Hybrid [O'Callahan and Choi 2003]

Dynamic Race Detection • Advantages • Precise knowledge of the execution • No False positive [Happens Before] • Unless you try to predict data races [Lockset] • Disadvantages • Produce false negatives • because they only consider a subset of possible program executions.

What we are going to analyze? • A trace representing an actual execution of a program • Trace is sequence of events: • MEM(m,a,t): thread t accessed memory local m, where the access a 2 {RD,WR} • m can be o.f, C.f, a[i] • ACQ(l,t): thread t acquires lock l • Ignore re-acquire of locks. l can be o. • REL(l,t): thread t releases lock l • SND(g,t): thread t sends message g • RCV(g,t): thread t receive message g • If t1 calls t2.start(), then generate SND(g,t1) and RCV(g,t2) • If t1 calls t2.join(), then generate SND(g,t2) and RCV(g,t1)

How to generate a trace? class Ref { int i; // guarded by this void inc() { print(“ACQ(“+id(this)+”,”+thisThread+”)”); synchronized (this) { print(“MEM(“+id(this)+”.i,RD,”+thisThread+”)”); int t = i + 1; print(“MEM(“+id(this)+”.i,WR,”+thisThread+”)”); i = t; } print(“REL(“+id(this)+”,”+thisThread+”)”); } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2; Instrument a Program class Ref { int i; // guarded by this void inc() { synchronized (this) { int t = i + 1; i = t; } } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2;

Sample Trace class Ref { int i; // guarded by this void inc() { print(“ACQ(“+id(this)+”,”+thisThread+”)”); synchronized (this) { print(“MEM(“+id(this)+”.i,RD,”+thisThread+”)”); print(“MEM(“+id(this)+”.i,WR,”+thisThread+”)”); i = i + 1; } print(“REL(“+id(this)+”,”+thisThread+”)”); } } Ref x = new Ref(0); parallel { x.inc(); // two calls happen x.inc(); // in parallel } assert x.i == 2; Sample Trace ACQ(4365,t1); MEM(4365.i,RD,t1) MEM(4365.i,WR,t1) REL(4365,t1); ACQ(4365,t2); MEM(4365.i,RD,t2) MEM(4365.i,WR,t2) REL(4365,t2);

Compute Locks Held by a Thread L(t) = locks held by thread t. How do we compute L(t)? Locks Held Sample Trace L(t1)={}, L(t2)={} L(t1)={4365}, L(t2)={} L(t1)={4365}, L(t2)={} L(t1)={4365}, L(t2)={} L(t1)={}, L(t2)={} L(t1)={}, L(t2)={4365} L(t1)={}, L(t2)={4365} L(t1)={}, L(t2)={4365} L(t1)={}, L(t2)={} ACQ(4365,t1); MEM(4365.i,RD,t1) MEM(4365.i,WR,t1) REL(4365,t1); ACQ(4365,t2); MEM(4365.i,RD,t2) MEM(4365.i,WR,t2) REL(4365,t2);

Let us now analyze a trace • Instrument Program • Run Program => A Trace File • Analyze Trace File

Happens-before relation • [Dinning and Schonberg 1991] • Idea: Infer a happens-before relation Á between events in a trace • We say e1Á e2 • If e1 and e2 are events from the same thread and e1 appears before e2 in the trace • If e1 = SND(g,t) and e2 = RCV(g,t’) • If there is a e’ such that e1Á e’ and e’ Á e2 • REL(l,t) and ACQ(l,t’) generates SND(g,t) and RCV(g,t’) • We say e1 and e2 are in race, if • e1 and e2 are not related by Á, • e1 and e2 are from different threads • e1 and e2 access the same memory location and one of the accesses is a write

Happens-before: example 1 Thread 1 Thread 2 x := x + 1 ACQ(mutex) v := v + 1 REL(mutex) Any two accesses of shared variables are in the relation happens-before ACQ(mutex) v := v + 1 REL(mutex) x := x + 1

Happens-before: example 2 Thread 1 Thread 2 ACQ(mutex) v := v + 1 REL(mutex) x := x + 1 Therefore, only this second execution reveals the existing datarace!! x := x + 1 ACQ(mutex) v := v + 1 REL(mutex)

Eraser Lockset • Savage,Burrows,Nelson,Sobalvarro,Anderson • Assume a database D storing tuples (m,L) where: • m is a memory location • L is a set of locks that protectm • Initially D contains a tuple (m,U) for each memory location m, where U is the universal set

How it works? • For an event MEM(m,a,t) generate the tuple (m,L(t)) • Let (m, L’) be the tuple present in D • Report race over memory location m if L(t) Å L’ = empty set • Replace (m, L’) by (m, L(t) Å L’) in D

L(t2) = {mutex} (v,{mutex}) 2D L(t1)={mutex} (v,{mutex}) 2D Eraser: Example 1 Thread 1 Thread 2 ACQ(mutex) v := v + 1 REL(mutex) ACQ(mutex) v := v + 1 REL(mutex)

L(t2) = {mutex2} (v,{}) 2D L(t1) = {mutex1} (v,{mutex1}) 2D Eraser: Example 2 Thread 1 Thread 2 ACQ(mutex1) v := v + 1 REL(mutex1) ACQ(mutex2) v := v + 1 REL(mutex2) Warning!!

Lockset any threadr/w Shared-read/write Track lockset Shared-exclusive Track lockset race condition!

Extending Lockset (Thread Local Data) first threadr/w Thread Local second threadr/w any threadr/w Shared-read/write Track lockset Shared-exclusive Track lockset race condition!

Extending Lockset (Read Shared Data) first threadr/w Thread Local second threadread second threadwrite any threadr/w Shared-read/write Track lockset Shared-exclusive Track lockset Read Shared any threadread any threadwrite race condition!

T1(L1,L2) v • false alarm T2(L2,L3) T3(L1,L3) Eraser: Problem Thread 1 Thread 2 Thread 3 ACQ(L2,L3) ACQ(L3,L1) ACQ(L1,L2) v := v + 1 v := v + 1 v := v + 1 REL(L3,L2) REL(L1,L3) REL(L2,L1)

Precise Lockset • Choi, Lee, Loginov, O'Callahan, Sarkar, Sridharan • Assume a database D storing tuples (m,t,L,a) where: • m is a memory location • t is a thread accessing m • L is a set of locks held by t while accessingm • a is the type of access (read or write) • Initially D is empty

How it works? • For an event MEM(m,a,t) generate the tuple (m,a,L(t),t) • If there is a tuple (m’,a’,L’,t’) in D such that • m = m’, • (a = WR) Ç (a’=WR) • L(t) Å L’ = empty set • t  t’ • Report race over memory location m • Add (m,a,L(t),t) in D

Optimizations • Stop adding tuples on m once a race on m is detected • Do not add (m,a,L,t) to D if (m,a,L’,t) is already in D and L’ µ L • Many more …

Precise Lockset Thread 1 Thread 2 x := x + 1 ACQ(mutex) ACQ(mutex) v := v + 1 v := v + 1 REL(mutex) REL(mutex) x := x + 1 D (x, RD,{},t1) (v,RD,{mutex},t2) (x,WR,{},t1) (v,WR,{mutex},t2) (v,RD,{mutex},t1) Conflict detected! (x,RD,{},t2) (v,WR,{mutex},t1) (x,WR,{},t2)

Precise Lockset Thread 1 Thread 2 Thread 3 ACQ(m2,m3) ACQ(m3,m1) ACQ(m1,m2) v := v + 1 v := v + 1 v := v + 1 REL(m3,m2) REL(m1,m3) REL(m2,m1) D (v,RD,{m1,m2},t1) (v,RD,{m2,m3},t2) (v,RD,{m1,m3},t3) (v,WR,{m1,m2},t1) (v,WR,{m2,m3},t2) (v,WR,{m1,m3},t3) No conflicts detected!

Precise Lockset: Not so Precise Thread 1 Thread 2 x := x + 1 t2.start() Precise Lockset gives Warning: But no warning with Happens-Before x := x + 1

Hybrid Dynamic Data Race Detection • Relax Happens Before • No happens before relation between REL(l,t) and a subsequent ACQ(l,t) • Maintain Precise Lockset along with Relaxed Happens-Before

Hybrid • O'Callahan and Choi • Assume a database D storing tuples (m,t,L,a,e) where: • m is a memory location • t is a thread accessing m • L is a set of locks held by t while accessingm • a is the type of access (read or write) • e is the event associated with the access • Initially D is empty

How it works? • For an event e = MEM(m,a,t) generate the tuple (m,a,L(t),t,e) • If there is a tuple (m’,a’,L’,t’,e’) in D such that • m = m’, • (a = WR) Ç (a’=WR) • L(t) Å L’ = empty set • t  t’ • e and e’ are not related by the happens before relation, i.e., :(e Á e’) Æ:(e’ Á e) • Report race over memory location m • Add (m,a,L(t),t,e) in D

Hybrid Dynamic Data Race Detection Thread 1 Thread 2 x := x + 1 Precise Lockset detects a data race, but e2 Á e3. Therefore, hybrid technique gives no warning t2.start() x := x + 1 D (x,RD,{},t1,e1) (x,RD,{},t2,e3) (x,WR,{},t1,e2) (x,WR,{},t2,e4)

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Distributed Computation • A set of processes: {p1,p2,…,pn} • No shared memory • Communication through messages • Each process executes a sequence of events • send(m) : sends a message with content m • receive(m): receives a message with content m • Internal event: changes local state of a process • ith event from process pj is denoted by eij

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Distributed Computation as a Partial Order • Distributed Computation defines a partial order on the events • e ! e’ • e and e’ are events from the same process and e executes before e’ • e is the send of a message and e’ is the receive of the same message • there is a e’’ such that e ! e’’ and e’’ ! e’

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Distributed Computation as a Partial Order • Problem: An external process or observer wants to infer the partial order or the computation for debugging • No global clock • At each event a process can send a message to the observer to inform about the event • Message delay is unbounded Observer

Can we infer the partial order? • From the observation: • Can we associate a suitable value with every event such that • V(e) < V(e’) , e ! e’ • We need the notion of clock (logical) e12 e13 e11 e21 e23 e43 e33 e31 e32 e22

Lamport’s Logical Time • All processes use a counter (clock) with initial value of zero • The counter is incremented by and assigned to each event, as its timestamp • A send (message) event carries its timestamp • For a receive (message) event the counter is updated by • Max(receiver-counter, message-timestamp) + 1 • Send the counter value along with an event to the observer

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Example 1 2 3 4 5 1 6 2 1 3

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Example • Problem with Lamport’s logical clock: • e ! e’ ) C(e) < C(e’) • C(e) < C(e’) ) e ! e’ X 1 2 3 4 5 1 6 2 1 3

Vector Clock • Vector Clock: Process ! Nat • V: P ! N • Associate a vector clock Vp with every process p • Update vector clock with every event as follows: • Internal event at p_i: • Vp(p) := Vp(p) + 1 • Send Message from p: • Vp(p) := Vp(p) + 1 • Send Vp with message • Receive message m at p: • Vp(i) := max(Vp(i),Vm(i)) for all i 2 P, where Vm is the vector clock sent with the message m • Vp(p) := Vp(p) + 1

p3 e13 e23 e33 e43 m1 m4 m2 e22 e32 e12 p2 m3 p1 e11 e21 e31 Physical Time Example V = (a,b,c) means V(p1)=a, V(p2)=b, and V(p3)=c (0,0,1) (0,1,2) (2,1,3) (2,1,4) (2,2,4) (0,1,0) (2,3,4) (1,0,0) (2,0,0) (3,0,0)

Intuitive Meaning of a Vector Clock • If Vp = (a,b,c) after some event then • p is affected by the ath event from p1 • p is affected by the bth event from p2 • p is affected by the cth event from p3

Comparing Vector Clocks • V · V’ iff for all p 2 P, V(p) · V’(p) • V = V’ iff for all p 2 P, V(p) = V’(p) • V < V’ iff V · V’ and V  V’ • Theorem: Ve < Ve’ iff e ! e’ • Send an event along with its vector clock to the observer

Definition of Data Race • Traditional Definition (Netzer and Miller 1992) x=1 if (x==1) …

Definition of Data Race • Traditional Definition (Netzer and Miller 1992) x=1 X send(m) receive(m) if (x==1) …

CS 265: Dynamic Data Race Detection

CS 265: Dynamic Data Race Detection

Presentation Transcript

Welcome to 2005 Crash-B PM3 Race System Training

Chapter 9

CPE 631: ILP, Dynamic Exploitation

Chapter 5: The Data Link Layer

Dense Object Recognition

Pacer: Proportional Detection of Data Races

Chapter 5: The Data Link Layer

Edge Detection

Structured Forests for Fast Edge Detection

Anomaly Detection: A Tutorial

Dynamic Software Updating: Introduction and Foundation

Sung-Eui Yoon Dissertation defense talk Advisor: Prof. Dinesh Manocha

Chapter 5: The Data Link Layer

The Generations Dynamic

Lecture 3: Dynamic ILP

Dynamic routing versus static routing

ソフトウェア工学特論 (11)

List, (dynamic) linked list

Dynamic programming

SPLIT SECOND 101 – Data Management of an Event

Outlier Detection for Graph Data