Concurrency Checking with CHESS: Learning from Experience

Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer

Outline • What is CHESS? • a testing tool, plus • a test methodology (concurrency unit tests) • a platform for research and teaching • Chess design decisions • Learnings from CHESS user forum, champions

What is CHESS? • CHESS is a user-mode scheduler • Controls all scheduling nondeterminism • “Hijacks” scheduling control from the OS • Guarantees: • Every run takes a different thread schedule • Reproduce the schedule for every run

Concurrency Unit Tests “Generally, in our test environment, we want to test what we call scenarios. A scenario might be a specific feature or API usage. In my case I am trying to test the scenario of a user canceling a command execution on a different thread.” Steve Hale, Microsoft

A Concurrency Unit Test Pattern:Fork-Join void ForkJoinTest() { var t1 = new Thread(() => { S1 }); var t2 = new Thread(() => { S2 }); t1.Start(); t2.Start(); t1.Join(); t2.Join(); Debug.Assert(...); }

Concurrency Unit Tests • Small scope hypothesis • For most bugs, there exists a short-running scenario with only a few threads that can find it • Unit tests provide • Better coverage of schedules • Easier debugging, regression, etc.

CHESS as Research/Teaching Platformhttp://research.microsoft.com/chess/ • Source code release • chesstool.codeplex.com • Courseware with CHESS • Practical Parallel and Concurrent Programming • coming this fall! • Preemption bounding [PLDI07] • speed search for bugs • simple counterexamples • Fair stateless exploration [PLDI08] • scales to large programs • Architecture [OSDI08] • Tasks and SyncVars • API wrappers • Store buffer simulation [CAV08] • Preemption sealing [TACAS10] • orthogonal to preemption bounding • where (not) to search for bugs • Best-first search [PPoPP10] • Automatic linearizability checking [PLDI10] • More features • Data race detection • Partial order reduction • More monitors…

CHESS Design Decisions • Stateless state space exploration • No change to underlying scheduler • Ability to enumerate all/only feasible schedules • Schedule points = synchronization points and use race detection to make up the difference • Serialize concurrent behavior • Suite of search/reduction strategies • preemption bounding, sealing • best-first search • Monitor API to easily add new checking capability

Stateless model checking [Verisoft] • Given a program with an acyclic state space • Systematically enumerate all paths • Don’t capture program states • Not necessary for termination • Precisely capturing states is hard and expensive • At the cost of potentially revisiting states • Partial-order reduction alleviates redundant exploration

CHESS architecture Unmanaged Program Win32 Wrappers CHESS Exploration Engine Windows CHESS Scheduler Managed Program • Capture scheduling nondeterminism • Drive the program along an interleaving of choice .NET Wrappers CLR

Running Example Thread 1 Thread 2 Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l);

Introduce Schedule() points Thread 1 Thread 2 • Instrument calls to the CHESS scheduler • Each call is a potential preemption point Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);

First-cut solution: Random sleeps • Introduce random sleep at schedule points • Does not introduce new behaviors • Sleep models a possible preemption at each location • Sleeping for a finite amount guarantees starvation-freedom Thread 1 Thread 2 Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l);

Improvement 1:Capture the “happens-before” graph Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); • Delays that result in the same “happens-before” graph are equivalent • Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Sleep(5) Sleep(5)

Improvement 2:Understand synchronization semantics • Avoid exploring delays that are impossible • Identify when threads can make progress • CHESS maintains a run queue and a wait queue • Mimics OS scheduler state Thread 1 Thread 2 Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);

Emulate execution on a uniprocessor Thread 1 Thread 2 • Enable only one thread at a time • Linearizes a partial-order into a total-order • Controls the order of data-races Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l);

CHESS modes: speed vs coverage • Fast-mode • Introduce schedule points before synchronizations, volatile accesses, and interlocked operations • Finds many bugs in practice • Data-race mode • Repeat • Find data races • Introduce schedule points before racing memory accesses • Captures all sequentially consistent (SC) executions

Capture all sources of nondeterminism?No. • Scheduling nondeterminism? Yes • Timing nondeterminism? Yes • Controls when and in what order the timers fire • Nondeterministic system calls? Mostly • CHESS uses precise abstractions for many system calls • Input nondeterminism? No • Rely on users to provide inputs • Program inputs, files read, packets received,… • Good tradeoff in the short term • But can’t find race-conditions on error handling code

CHESS architecture Unmanaged Program Win32 Wrappers CHESS Exploration Engine Windows CHESS Scheduler Managed Program .NET Wrappers CLR

CHESS wrappers • Translate Win32/.NET synchronizations • Into CHESS scheduler abstractions • Tasks : schedulable entities • Threads, threadpool work items, async. callbacks, timer functions • SyncVars : resources used by tasks • Generate happens-before edges during execution • Executable specification for complex APIs • Most time consuming and error-prone part of CHESS • Enables CHESS to handle multiple platforms

Learning from Experience:User forum, Champions http://msdn.microsoft.com/en-us/devlabs/cc950526.aspx http://social.msdn.microsoft.com/Forums/en-US/chess/threads/

“CHESS Doesn’t Scale” • Hmm… we just ran CHESS on the Singularity operating system (and found bugs in the bootup/shutdown sequence) • What they usually mean: • “CHESS isn’t very effective on a long-running test” • “There are a lot of possible schedules!” • Time for enumerative model checking • (Time to execute one test) x (# schedules)

Find lots of bugs with 2 preemptions

“CHESS Isn’t Push Button” Concurrency Unit Tests • “The more I look at CHESS the more I realize that I could use some general guidance on how to author test code that will actually help CHESS reveal concurrency bugs.” • Daniel Stolt 

Challenge -> Opportunity: New “Push button” concurrency tools • Cuzz [ASPLOS 2010]: Concurrency Fuzzing • Attach to any running executable • Find concurrency bugs faster through smart fuzzing • Lineup [PLDI 2010]: Automatic Linearizability Checking • Generate “thread-safety” tests for a class automatically • Use sequential behavior as oracle for concurrent behavior • CHESS underneath

“CHESS Doesn’t Find This Bug” void ForkJoinTest() { int x = 0; var t1 = new Thread(() => { x=x+1; }); var t2 = new Thread(() => { x=x+1; }); t1.Start(); t2.Start(); t1.Join(); t2.Join(); Debug.Assert(x==2); } • RTFM is not helpful • Instead, generate helpful warning messages • “Warning: running CHESS without race detection can miss bugs” • Or, turn race detection on for a few executions. 

“CHESS Can’t Avoid Finding Bugs” “Solution is working and found two bug with CHESS . To get the second bug, I had to fix first bug first” “That liveness bug is such a minor performance problem that I won’t fix it.” 

Playing CHESS with George

“CHESS is Confusing Me”  RunTest is Not Idempotent

The Nondeterminism Saga: static data, lazily initialized If replay of p.E fails, yielding p.F, then try again and see if p.F replays Report lost coverage p F E

Nondeterminism Junkie: Too much information “Why does this test pass instead of say ‘Detected nondeterminism’ outside the control of CHESS"?

!?! “Is this good behavior for CHESS to return three different results for the same code?”

“CHESS Time Isn’t Real Time”: It’s a feature, not a bug. “The call to WaitOne(60000, false) immediately returns false, which isn’t correct. If I use WaitOne() or WaitOne(Timeout.Infinite, false) instead of WaitOne(60000, false), the WaitHandle waits till the Event is set, returns true and everything goes fine. But waiting without a timeout isn't an option in my case.”  

The expected: “I can’t play CHESS on” • x64 • Multi-process programs • Message passing, distributed systems • The Boost library • .NET without the CLR Profiler • Java • Unix • …

Learning from Experience:Forums, Champions Chris Dern, Steve Hale, Ram Natarajan, Roy Tan

“Congratulations CHESS team!!!!! I have proven outside of CHESS that the issue it is finding in our product on the 106th thread schedule looks like a valid product bug!! I wrote a quick application to launch my CHESS test outside of CHESS and by freezing/thawing threads I was able to reproduce the issue independently. This is incredibly exciting!!! Many thanks for your patience, perseverance, and CHESS bug fixes as I’ve struggled to understand CHESS.” Steve Hale, Microsoft , 2/12/2009 More Great Quotes Like This…

BORING!

Learning By Flailing… With PFX

PLINQ Parallel.For TaskScheduler Task ConcurrentBag BlockingCollection ConcurrentDictionary Barrier SemaphoreSlim ManualResetEventSlim

“As the true value of a test is in its ability to find bugs, let’s take a look at how our CHESS tests did. Over the development cycle to date, the CHESS test found seven bugs, and was used to reproduce another seven for a total of 14, out of the 276 high priority bugs over the same time. While only 14 bugs against 276 appear sadly anemic, it’s important to dig a bit deeper. If we address each of the issues raised, would we find more bugs?” Chris Dern, PFX_CHESS_Review_Final.docx

“Early on the adoption of CHESS, we made a fatal mistake. Perhaps it was wishful thinking on our part, or perhaps we believed too much in the marketing hype and didn’t read the fine print. We believed early on that CHESS was a turnkey solution capable of using existing tests and test approaches and ‘finding the bugs’. “ C. Dern

“The schedule for any product group is always under attack. Over the life cycle of a product, features are in constant flux, with managers always balancing risk and reward. In the face of this pressure, any untried tool, methodology, or approach faces an uphill battle.” C. Dern

“For tool developers, it’s important that once you engage with a customer you help find then drive to some level of success. Finding a single bug is a priceless commodity when arguing to continue the time investment in a specific tool. Take small bites, set modest goals and drive to success. Perfect is the enemy of good, or at least good enough right now.” C. Dern

Dern’s DO’s and DON’Ts Do not expect that CHESS will ‘magically’ find your bugs. CHESS is a tool, mainly focused at enumerating schedules for a given bound. While it can find specific types of concurrency bugs, e.g. deadlocks, for ‘free’ the value and benefit of CHESS comes with deliberate tests.

Concurrency Checking with CHESS: Learning from Experience