via Cooperative Testing and Analysis

Improving Software Dependability via Cooperative Testing and Analysis Tao Xie Department of Computer Science North Carolina State University

Software Dependability Matters • Loss of Money: Software faults costed the U.S. economy about $59.5 billion each year (0.6% GDP) [NIST 02] • Loss of Life: Faulty medical devices caused 30,000 deaths and 600,000 injuries (1985-2005), with likely 8% due to software faults [FDA 06] • …

Improving Software Dependability Titles of Major Conference Pubs (2005-Present)

Improving Software Dependability Major Conference Pubs (2005-Present) Testing & Analysis Analytics Reliability ICSE 11, ICSE 10a, ICSE 10b ICSE 09b, ICSE 07 FSE 10 , FSE 12c ISSTA 11, ISSTA 10, ISSTA 09 ASE 11a, ASE 08b, ASE 06 OOPSLA 11, ECOOP 06 ICSE 12a, ICSE 09a ICSE 08, ICSE 05 FSE 09, FSE 07, FSE 12b ASE 11b, ASE 10, ASE 09a ASE 09b, ASE 08a, ASE 07 ECOOP 09, WWW 13 Security/Privacy FSE 11, SIGMETRICS 08 WWW 07 FSE 12a Performance ICSE 12b 10 ICSE, 7 FSE 3 ISSTA, 9 ASE 3 OOPLSA/ECOOP SIGMETRICS 08

Improving Software Dependability Major Conference Pubs (2005-Present) • Artifacts Under Analysis • DB apps • GUI apps • Web/SOA apps • Mobile apps • Cloud apps • Analytics systems • AC/Firewall policies • API docs • Bug reports • Requirements doc • Execution traces • … Testing & Analysis Analytics Reliability ICSE 11, ICSE 10a, ICSE 10b ICSE 09b, ICSE 07 FSE 10 , FSE 12c ISSTA 11, ISSTA 10, ISSTA 09 ASE 11a, ASE 08b, ASE 06 OOPSLA 11, ECOOP 06 ICSE 12a, ICSE 09a ICSE 08, ICSE 05 FSE 09, FSE 07, FSE 12b ASE 11b, ASE 10, ASE 09a ASE 09b, ASE 08a, ASE 07 ECOOP 09, WWW 13 Security/Privacy FSE 11, SIGMETRICS 08 WWW 07 FSE 12a Performance ICSE 12b 10 ICSE, 7 FSE 3 ISSTA, 9 ASE 3 OOPLSA/ECOOP SIGMETRICS 08

Impact/Leadership: Software Testing • We produce; others use • We lead; others follow Keynotes 2010 Tutorials Programming Contests Call for Proposals

Redundant Test Detectionfor ParasoftJtest Rostra identified 90% tests generated by ParasoftJtest 4.5 to be redundant. Parasoft fixed issue in later versions after seeing our results. ASE 2004 7

Fitnex Path-Exploration Strategy for Pex in Pex Released since 2008 • Download countsinitial 20 months of releaseAcademic: 17,366 • Industrial: 13,022 • Total: 30,388 Pex detected various bugs (including a serious bug) in a core .NET component (already been extensively tested over 5 years by 40 testers) , used by thousands of developers and millions of end users. “It has saved me two major bugs (not caught by normal unit tests) that would have taken at least a week to track down and fix normally plus a few smaller issues so I'm a big proponent of Pex.” 8

Coding Duel Games forPex for Fun Released since 2010 www.pexforfun.com 1,129,019 clicked 'Ask Pex!' “I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.” X “I’m afraid I’ll have to constrain myselfto spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.” “It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!”

Access Control Policy Tool (ACPT) Released since 2009 Access Control Policy Tool (ACPT) beta release being beta-tested with >130 users/organizations “There are many valuable features in the ACPT and we hope to recommend it to our vendors to verify and validate the policies they author.” Beta-users: NSA, MITRE, DISA, NOAA, SAIC, DNI, Pacific Northwest National Lab, Fermi Lab, BAE system, Lockheed Martin, Raytheon, Boeing, SMI, VA government, John Hopkins U., … “ACPT provides all the adequate functionality for the verification of access control policies against static constraints.” 10

Impact/Leadership: Software Testing • We produce; others use • We lead; others follow Scalable Unit Test Generation Testing+Human Factors Keynotes 2010 Tutorials Programming Contests Call for Proposals

Impact/Leadership: Software Analytics • We produce; others use • We lead; others follow XIAO StackMine 2014 2007 7 years of ICSE Tutorials 2013 … 2013

StackMine: Performance Debugging in the Large StackMine 13

XIAO: Code Clone Detectionfor Security + Refactoring XIAO available in Visual Studio 2012 XIAO Clone Search integrated in workflow @Microsoft Security Response Center (MSRC) “run [XIAO] for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE-2011-3402 that we are now addressing with MS12-034.” -MS Security Research & Defense blog Searching similar snippets for fixing bug once Finding refactoring opportunity 2012

Impact/Leadership: Software Analytics • We produce; others use • We lead; others follow Software Analytics (insightful and actionable info for software practitioners) XIAO StackMine 2014 2007 7 years of ICSE Tutorials 2013 … 2013

Cooperative Testing and Analysis Testing & Analysis Analytics Reliability ICSE 11, ICSE 10a, ICSE 10b ICSE 09b, ICSE 07 FSE 10 , FSE 12c ISSTA 11, ISSTA 10, ISSTA 09 ASE 11a, ASE 08b, ASE 06 OOPSLA 11, ECOOP 06 ICSE 12a, ICSE 09a ICSE 08, ICSE 05 FSE 09, FSE 07, FSE 12b ASE 11b, ASE 10, ASE 09a ASE 09b, ASE 08a, ASE 07 ECOOP 09, WWW 13 Security/Privacy FSE 11, SIGMETRICS 08 WWW 07 FSE 12a Performance ICSE 12b ICSE 12b SIGMETRICS 08

Global Trend: Replace Human orGet Human Out of the Loop IBM Watson as Jeopardy! player Google’s driverless car Microsoft's instant voice translation tool

Unit Test Generation: Replace Human or Get Human Out of the Loop Class Under Test 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); 05: } 06: public Edge AddEdge(Vertex v1, Vertex v2) { … 15: } 16: } Manual Test Generation: Tedious, Missing Special/Corner Cases, … Generated Unit Tests • void test2() { • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • AddEdge(v1, v1); • } void test1() { Graph ag = new Graph(); Vertex v1 = new Vertex(0); } … 18 18

State-of-the-Art/Practice Test Generation Tools Running Symbolic PathFinder ... … ====================================================== results no errors detected ====================================================== statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 …

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08] • Instrument code to explore feasible paths • Challenge: path explosion When desirable receiver or argument objects are not generated Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example Object-Creation Problem 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [OOPSLA 11] • A graph example from QuickGraph library • Includes two classes Graph DFSAlgorithm • Graph AddVertex AddEdge: requires both vertices to be in graph 21 21

Example Object-Creation Problem • Test target: Cover true branch (B4) of Line 24 • Desired object state: graph should include at least one edge • Target sequence: • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • Vertex v2 = new Vertex(1); • ag.AddVertex(v1); • ag.AddVertex(v2); • ag.AddEdge(v1, v2); • DFSAlgorithmalgo = new DFSAlgorithm(ag); • algo.Compute(v1); 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [OOPSLA 11] 22 22

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08] • Instrument code to explore feasible paths • Challenge: path explosion • Typically DSE instruments or explores only methods @ project under test; • Third-party API external methods (network, I/O, ..): • too many paths • uninstrumentable Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example External-Method Call Problems (EMCP)

Challenges Faced by Test Generation Tools • Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08] • Instrument code to explore feasible paths • Challenge: path explosion Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

What to Do Next? 2010 Dagstuhl Seminar 10111 Practical Software Testing: Tool Automation and Human Factors

Conventional Wisdom: Improve Automation Capability @NCSU ASE • Tackling object-creation problems • Seeker [OOSPLA 11] , MSeqGen[ESEC/FSE 09] Covana[ICSE 2011], OCAT [ISSTA 10]Evacon[ASE 08], Symclat[ASE 06] • Still not good enough (at least for now)! • Seeker (52%) > Pex/DSE (41%) > Randoop/random (26%) • Tackling external-method call problems • DBApp Testing [ESEC/FSE 11], [ASE 11] • CloudApp Testing [IEEE Soft 12] • Deal with only common environment APIs

Unconventional Wisdom: Bring Human in the Loop Ironies of Automation “The increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.” Malaysia Airlines Flight 124 @2005 Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .

Example Object Creation Problem (OCP) • Test target: Cover true branch (B4) of Line 24 • Desired object state: graph should include at least one edge • Target sequence: • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • Vertex v2 = new Vertex(1); • ag.AddVertex(v1); • ag.AddVertex(v2); • ag.AddEdge(v1, v2); • DFSAlgorithmalgo = new DFSAlgorithm(ag); • algo.Compute(v1); 00: classGraph { … 03: public void AddVertex(Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge(Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (Vertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } 29 29

Human Can Help! Object Creation Problems (OCP) Tackle object-creation problems with Factory Methods

Human Can Help!External-Method Call Problems (EMCP) Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText

Automation in Software Testing 2010 Dagstuhl Seminar 10111 Practical Software Testing: Tool Automation and Human Factors

Automation in Software Testing Human Factors Dagstuhl Seminar 10111 Practical Software Testing: Tool Automation and Human Factors

CooperativeSoftware Testing and Analysis • Human-AssistedComputing • Driver: tool Helper: human • Ex. Covana[ICSE 2011] • Human-CentricComputing • Driver: human  Helper: tool • Ex. Pex for Fun[ICSE 2013 SEE] Interfaces are important. Contentsare important too!

Example Problems Faced by Tools Symptoms all non-primitive program inputs/fields object-creation problems (OCP) external-method call problems (EMCP) (Likely) Causes • all executed external-method calls

Technical Challenges • Causal analysis: tracing between symptoms and (likely) causes • Reduce cost of human consumption • reduction of #(likely) causes • diagnosis of each cause • Solution construction: fixing suspected causes • Reduce cost of human contribution • measurement of solution goodness • Inner iteration of human-tool cooperation!

Black-Box Systematic Debugging Not Feasible Symptoms • Given symptom s • foreach (c in LikelyCauses) { • Fix(c); if (IsObserved(s)) RelevantCauses.add(c) } object-creation problems (OCP) external-method call problems (EMCP) (Likely) Causes

White-Box Causal Analysis: Covana [ICSE 11] • Goal: Precisely identify problems (causes) faced by a tool for causing not to cover a statement (symptom) • Insight: Partially-covered conditional has data dependency on a real problem From xUnit

ECMP with Data Dependency on Program Inputs [Inputs  EMCP] • Consider only EMCPs whose arguments have data dependencies on program inputs • Fixing such problem candidates facilitates test-generation tools Data Dependencies From xUnit

Symptom with Data Dependency on EMCP [EMCP Symptom] Symptom Expression: return(File.Exists) == true Element of EMCP Candidate: return(File.Exists) • Partially-covered conditionals have data dependencies on EMCP candidates Conditional in Line 1 has data dependency on File.Exists

Example EMCP being Filtered [EMCP !Symptom] From xUnit

Tool Architecture of Covana Problem Candidate Identification Program Generated Test Inputs Forward Symbolic Execution Runtime Events Problem Candidates [Inputs  EMCP] Coverage Runtime Information Data Dependence Analysis Identified Problems [EMCP Symptom]

Evaluation – Subjects and Setup • Subjects: • xUnit: unit testing framework for .NET • 223 classes and interfaces with 11.4 KLOC • QuickGraph: C# graph library • 165 classes and interfaces with 8.3 KLOC • Evaluation setup: • Apply Pex to generate tests for program under test • Feed the program and generated tests to Covana • Compare baseline solution and Covana

Evaluation – Research Questions • RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs? • RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs?

Evaluations - RQ1: Problem Identification • Covana identifies • 43 EMCPs with only 1 false positive and 2 false negatives • 155 OCPs with 20 false positives and 30 false negatives.

Evaluation –RQ2: Irrelevant-Problem-Candidate Pruning • Covana prunes • 97%(1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives • 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives

CooperativeSoftware Testing and Analysis • Human-AssistedComputing • Driver: tool Helper: human • Ex. Covana[ICSE 2011] • Human-CentricComputing • Driver: human  Helper: tool • Ex. Pex for Fun[ICSE 2013 SEE] Interfaces are important. Contentsare important too!

Coding Duel Games forPex for Fun Released since 2010 www.pexforfun.com 1,129,019 clicked 'Ask Pex!' “I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.” X “I’m afraid I’ll have to constrain myselfto spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.” “It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!”

Behind the Scene of Pex for Fun behavior Secret Impl== Player Impl Player Implementation class Player { public static int Puzzle(int x) { return x; } } Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } }

Coding Duel Competition@ICSE 2011 http://pexforfun.com/icse2011

via Cooperative Testing and Analysis

via Cooperative Testing and Analysis

Presentation Transcript

Process Improvement via an Industry Cooperative Exchange

Cooperative (Group) Testing

Heat sink analysis: analytically and via ANSYS

Protocol Analysis/Testing

ToneWars : Cooperative Second Language Learning via Mobile Games

CS265: Program Analysis, Testing, and Debugging

API Cooperative Testing Program 2013-2014

Integrating Testing and Analysis

Data Analysis via IlliniData

CoDNS : Improving DNS Performance and Reliability via Cooperative Lookups

Testing and Failure Analysis Thermal Analysis (TA)

Teaching Mathematics via Cooperative Problem Solving

Modal Testing and Analysis

Boosting Concolic Testing via Interpolation

Cooperative Testing and Analysis:

hair analysis testing

6 Website Testing and Analysis Tools

Website Testing and Analysis

Protocol Analysis/Testing

A Framework for Testing and Analysis

Modal Testing and Analysis

TGA Testing And Analysis | FAN SERVICES