Property-Based Testing A Silver Bullet ?

Property-Based TestingA Silver Bullet? John Hughes December 2009

Software testing: mostfamousquote • ”Program testing can be used to show the presence of bugs, butnever to show theirabsence!” • E.W.Dijkstra

$60 billion

$240 billion

50%

Money spent on testing ≈ Cost of remainingerrors

Testing in Practice? • Human effort? • Test automation

Large-Scale Test Automation 1,5MLOC Erlang, 2MLOC C++ Software under test • Nightlyruns provide rapid feedback • New test casesadded for eacherrorfound Report of test casefailures Test Server Automated test cases 700KLOC Erlang

TypicalLarge Projects Test team Design team

Bug Detection Rate

Developer Testing • Whywaituntilsystem testing to use test automation? • Why not automatedevelopers’ own testing? • Unit testing—onemodule in isolation • A key element of agiledevelopmentmethodssuch as XP

Claims for Unit Testing • Immediatediscovery of errors • bugfixing is cheap! • Confidence in refactoring • cleanercode! • TDD: write tests first, then just enoughcode to make them pass • KISS! No wastedeffort! • Tests serve as a specification • So keep test codeclean and elegant! • Not toomany… onetest for eachthing!

TDD with HUnit in Haskell • Problem: implement a key-value store -- Typesignatures empty :: Store k v store :: Ord k => k -> v -> Store k v -> Store k v find :: Ord k => k -> Store k v -> Maybe v remove :: Ord k => k -> Store k v -> Store k v

Step 1: Tests for find A test case is a definition testFindEmpty = "find empty" ~: find 1 empty @?= (Nothing :: Maybe Int) Attach a name to a test case testFind1 = "find with one element" ~: find 1 (store 1 2 empty) @?= Just 2 An assertion (@)—equalitywhereleftside is unknown, right side is ”expected” value testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing Can combineseveralassertions and IO actions in one test case

HunitGlue import Test.HUnit main = runTestTTfindTests findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2]

Step 2: Run the tests *Main> main ### Error in: find tests:0:find empty Prelude.undefined ### Error in: find tests:1:find with one element Prelude.undefined ### Error in: find tests:2:find with two elements Prelude.undefined Cases: 3 Tried: 3 Errors: 3 Failures: 0 Counts {cases = 3, tried = 3, errors = 3, failures = 0} *Main> import Test.HUnit main = runTestTTfindTests A message from eachfailing test findTests = "find tests" ~: [testFindEmpty, testFind1, testFind2] data Store k v = Store find = undefined store = undefined remove = undefined empty = undefined A summary of the test results

Step 3: Write just enoughcode data Store k v = Nil | Node k v (Store k v) (Store k v) deriving (Eq, Show) find k Nil = Nothing find k (Node k' v l r) | k == k' = Just v | k < k' = find k l | k > k' = find k r store k v Nil = Node k v NilNil store k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r) empty = Nil remove = undefined Orderedbinarytrees Don’twriteremoveyet

Step 4: Repeat the tests *Main> main ### Failure in: find tests:2:find with two elements expected: Just 2 but got: Nothing Cases: 3 Tried: 3 Errors: 0 Failures: 1 Counts {cases = 3, tried = 3, errors = 0, failures = 1} testFind2 = "find with two elements" ~: do let s = store 1 2 (store 3 4 empty) find 1 s @?= Just 2 find 3 s @?= Just 4 find 5 s @?= Nothing

Step 5: Debug the code store k v Nil = Node k v NilNil store k v (Node k' v' l r) | k'<= k = Node k' v' (store k v l) r | k' > k = Node k' v' l (store k v r) k <=k' k > k'

Step 6: Rerun the tests • All the tests pass—nowwewritemore tests! *Main> main Cases: 3 Tried: 3 Errors: 0 Failures: 0 Counts {cases = 3, tried = 3, errors = 0, failures = 0}

Next Iteration: tests for remove removeTests = "remove tests" ~: [testRemoveEmpty, testRemove1, testRemove2] testRemoveEmpty = "removeempty" ~: remove 1 empty @?= (empty :: Store IntInt) testRemove1 = "remove with one element" ~: remove 1 (store 1 2 empty) @?= empty testRemove2 = "remove with two elements" ~: dolet s = store 1 2 (store 3 4 empty) remove 1 s @?= store 3 4 empty remove 3 s @?= store 1 2 empty remove 5 s @?= s

Run the tests main = runTestTTallTests allTests = "all tests" ~: [findTests, removeTests] *Main> main ### Error in: all tests:1:remove tests:0:remove empty Prelude.undefined ### Error in: all tests:1:remove tests:1:remove with one element Prelude.undefined ### Error in: all tests:1:remove tests:2:remove with two elements Prelude.undefined Cases: 6 Tried: 6 Errors: 3 Failures: 0 Counts {cases = 6, tried = 6, errors = 3, failures = 0}

Implementation of remove k,v nk,nv

Code for remove remove k Nil = Nil remove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Nodenknv l (removenk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r) leftmost (Node k v Nil _) = (k,v) leftmost (Node _ _ l _) = leftmost l

Last step: rerun the tests • No failures, so we’redone! *Main> main Cases: 6 Tried: 6 Errors: 0 Failures: 0 Counts {cases = 6, tried = 6, errors = 0, failures = 0} …or are we???

Test Coverage • All tests pass—buthowgood are our tests? • Sourcecodecoveragetoolstellushowmuchcodewetested • When tests pass, check coverage!

UsingHaskell Program Coverage C:\Users\John Hughes\Desktop> ghc-fhpcStore.hs--make C:\Users\John Hughes\Desktop> Store.exe Cases: 6 Tried: 6 Errors: 0 Failures: 0 C:\Users\John Hughes\Desktop> hpcmarkupStore.exe Writing: Main.hs.html…

Marked-upsourcecode Conditionswhichwerealwaystrue Codewhichwasneverexecuted!

Just… one… more… test… testRemoveNonEmptyRightBranch = "remove with non-empty right branch" ~: remove 1 (store 3 4 (store 1 2 empty)) @?= store 3 4 empty

But… • This last test has nothing to do with a specification • It cannot be written ”first” • Test caseswritten just to get coverage are often bad test cases • Manymany tests are needed—boring! • Does TDD reallycut the mustard?

WhichUnit Tests to Write? • ”You should test things that might break” —Kent Beck • Not toofew, not toomany • Partition the casesintoclasses with similarbehaviour • Writeone test per partition

Example: insertioninto an ordered list • Partitions: • Emptylist/non-empty list • Insert at beginning/middle/end • Test boundaryvalues and middlevalues • Element already present/not present

Partition tests insertEmpty = "insertempty" ~: insert 1 [] @?= [1] insertStart = "insert start" ~: insert 1 [2,4] @?= [1,2,4] insertMid = "insertmid" ~: insert 3 [2,4] @?= [2,3,4] insertEnd = "insertend" ~: insert 5 [2,4] @?= [2,4,5] insertPresent = "insert present" ~: insert 1 [1] @?= [1,1] • insertNonEmptycovered by othercases • insertAbsentcovered by othercases • Note: expectedvalues play a major rôle!

Sum or Product of Partitions? • Given severalways to partition inputs, shouldwe • Writeone test for eachpartition? • Writeone test for eachcombination of partitions? • E.g. Non-empty/Beginning/Present, Non-empty/Beginning/Absent, … • (Can be smart and cover all pairs of partitions, or all triples…)

Property Based Testing • Generate test casesinstead of inventingthem • Automate the boring bit! • Reducesize of test code • Focus on propertiestrue in all cases, not single tests • A truespecification • Minimizefailing test cases to speed debugging

Generating Stores • Howto generate, how to shrink instance (Ord k, Arbitrary k, Arbitrary v) => Arbitrary (Store k v) where arbitrary = do (k,v,s) <- arbitrary elements [empty, store k v s, remove k s] shrinkNil = [] shrink(Node k v l r) = [l,r] ++ [Node k v l' r | l' <- shrink l] ++ [Node k v l r' | r' <- shrink r] ++ [Node k v' l r | v' <- shrink v]

Model-based testing • Whatdoes a store represent? • A set of key-value pairs! Sorted so wecancomparethem with == model s = List.sort (contents s) contents Nil = [] contents (Node k v l r) = (k,v):contents l ++ contents r

Properties: Agreement with the model prop_find k s = find k s == lookup k (model s) where types = s :: Store IntInt prop_store k v s = model (store k v s) == List.insert (k,v) (model s) where types = s :: Store IntInt prop_remove k s = case find k s of Just v -> model (remove k s) == model s List.\\ [(k,v)] Nothing -> remove k s == s where types = s :: Store IntInt

Testing the Properties • Weforgot to considerduplicatekeys! *Main> quickCheckWithstdArgs{maxSuccess=10000} prop_find *** Failed! Falsifiable (after 95 tests and 1 shrink): 1 Node 1 1 (Node 1 (-1) NilNil) Nil prop_find k s = case [v | (k',v) <- model s, k==k'] of [] -> find k s == Nothing vs -> find k s `elem` map Just vs where types = s :: Store IntInt

Testing remove • We’re not removing the duplicate key…??? *Main> quickCheckWithstdArgs{maxSuccess=10000} prop_remove +++ OK, passed 10000 tests. *Main> quickCheckWithstdArgs{maxSuccess=10000} prop_remove *** Failed! Falsifiable (after 2 tests): 0 Node 0 1 Nil (Node 1 1 (Node 1 0 NilNil) Nil) *Main> let s = Node 0 1 Nil (Node 1 1 (Node 1 0 Nil Nil) Nil) *Main> find 0 s Just 1 *Main> remove 0 s Node 1 0 Nil (Node 1 0 Nil Nil)

The Bug remove k Nil = Nil remove k (Node k' v l r) | k == k' = case r of Nil -> l _ -> let (nk,nv) = leftmost r in Nodenknv l (removenk r) | k < k' = Node k' v (remove k l) r | k > k' = Node k' v l (remove k r) Removesnkwith the wrongvalue

How are wedoing for coverage?

Hunit vs QuickCheck • QuickCheck • Findsmorebugs • With bettercoverage • In less time • With less test code • And a clearerspecification

3G Radio Base Station Setup OK Setup OK Reject

Media Proxy • Multimedia IP-telephony (IMS) • Connectscallsacross a firewall • Test adding and removingcallers from a call Add Add Sub Add Sub Add Sub Call Full

Property Based Testing is Great! • Improvesquality! • Findsmorebugs, achievesbettercoverage • Reducescost! • Less test code, shrinking speeds diagnosis • And it’sactuallyfun! • ”Pleasecan I writesome tests today?”

Howdoweknow? • Case studies in industry +Real software development + Professional software developers - Unrepeatable - Difficult to control • Experiments in universities + Focus on a singlequestion + Carefullycontrolled • Student volunteers • Unrealistically small

Test Driven Development • Case Studies • YES, quality is improved • NO, cost is not reduced (costsriseabout 20%) • Experiments • YES, code is developed faster • NO, quality is not improved (it drops)

Property Based Testing • Case studies + Property-based testing doesincreasequality + Property-based testing doesreducecost • PBT duringsystem testingactuallyreduces the quality of conventionalunit testing doneearlier

Our Experiment • Hypothesis: • Property-based testing is moreeffectivethanconventionalunit testing • Effective? • Quality (betterquality in the same time) • number of bugs, number of tests failed, subjective judgement • Test quality • codecoverage, subjective judgement • Design quality • Size of code, size of test code, subjective judgement

Property-Based Testing A Silver Bullet ?