Cooperative Testing and Analysis:

Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done Cooperative Testing and Analysis: Tao Xie North Carolina State University Raleigh, NC, USA

Turing Test Tell Machine and Human Apart

Human vs. MachineMachine Better Than Human? IBM's Deep Blue defeated chess champion Garry Kasparov in 1997 IBM Watson defeated top human Jeopardy! players in 2011

CAPTCHA: Human is Better "Completely Automated Public Turing test to tell Computers and Humans Apart"

Human Computer Interaction iPad Movie: Minority Report CNN News

Human-Centric Software Engineering …

Automation in Software Testing http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011 2010 Dagstuhl Seminar 10111 PracticalSoftware Testing: Tool Automation and Human Factors

Automation in Software Testing Human Factors http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011 2010 Dagstuhl Seminar 10111 PracticalSoftware Testing: Tool Automation and Human Factors

Automated Test Generation • Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing • Instrument code to explore feasible paths • Example tool: Pex from Microsoft Research (for .NET programs) Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proc. PLDI 2005 KoushikSen, DarkoMarinov, and GulAgha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005 Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for .NET. In Proc. TAP 2008

Dynamic Symbolic Execution Choose next path Code to generate inputs for: Solve Execute&Monitor void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug"); } Negated condition a==null F T a.Length>0 T F Done: There is no path left. a[0]==123… F T Data null {} {0} {123…} Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!=1234567890 a!=null && a.Length>0 && a[0]==1234567890 Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]==1234567890

Automating Test Generation @NCSU ASE • Method sequences • MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09], Covana[Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon[Inkumsah et al. ASE 08], Symclat[d'Amorim et al. ASE 06] • Environments e.g., db, file systems, network, … • DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11] • CloudApp Testing [Zhang et al. IEEE Soft 12] • Loops • Fitnex[Xie et al. DSN 09] • Code evolution • eXpress[Taneja et al. ISSTA 11]

Pex on MSDN DevLabsIncubation Project for Visual Studio • Download counts (20 months)(Feb. 2008 - Oct. 2009 ) • Academic: 17,366 • Devlabs: 13,022 • Total: 30,388 http://research.microsoft.com/projects/pex/

Open Source Pex extensions http://pexase.codeplex.com/ Publications:http://research.microsoft.com/en-us/projects/pex/community.aspx#publications

State-of-the-Art/Practice Testing Tools Running Symbolic PathFinder ... … ====================================================== results no errors detected ====================================================== statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 …

Challenges Faced by Test Generation Tools • Example: Dynamic Symbolic Execution/Concolic Testing • Instrument code to explore feasible paths • Challenge: path explosion Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example Object-Creation Problem 00: classGraph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (IVertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [Thummalapenta et al. OOPSLA 11] • A graph example from QuickGraph library • Includes two classes Graph DFSAlgorithm • Graph AddVertex AddEdge: requires both vertices to be in graph 16 16

Example Object-Creation Problem • Test target: Cover true branch (B4) of Line 24 • Desired object state: graph should include at least one edge • Target sequence: • Graph ag = new Graph(); • Vertex v1 = new Vertex(0); • Vertex v2 = new Vertex(1); • ag.AddVertex(v1); • ag.AddVertex(v2); • ag.AddEdge(v1, v2); • DFSAlgorithmalgo = new DFSAlgorithm(ag); • algo.Compute(v1); 00: classGraph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: classDFSAlgorithm{ … 23: public void Compute (IVertex s) { ... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e ingraph.GetEdges()) { 27: ... // B5 28: } 29: } } } [Thummalapenta et al. OOPSLA 11] 17 17

Challenges Faced by Test Generation Tools • Example: Dynamic Symbolic Execution/Concolic (Pex) • Instrument code to explore feasible paths • Challenge: path explosion Total block coverage achieved is 50%, lowest coverage 16%. • object-creation problems (OCP) - 65% • external-method call problems (EMCP) – 27%

Example External-Method Call Problems (EMCP) • Example 1: • File.Existshas data dependencies on program input • Subsequent branch at Line 1 using the return value of File.Exists. 1 • Example 2: • Path.GetFullPathhas data dependencies on program input • Path.GetFullPaththrows exceptions. 2 • Example3: String.Formatdo not cause any problem 3

Human Can Help! Object Creation Problems (OCP) Tackle object-creation problems with Factory Methods

Human Can Help!External-Method Call Problems (EMCP) Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText

State-of-the-Art/Practice Testing Tools Tools Typically Don’t Communicate Challenges Faced by Them to Enable Cooperation between Tools and Users Running Symbolic PathFinder ... … ====================================================== results no errors detected ====================================================== statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 …

Bigger Picture • Machine is better at task set A • Mechanical, tedious, repetitive tasks, … • Ex. solving constraints along a long path • Human is better at task set B • Intelligence, human intent, abstraction, domain knowledge, … • Ex. local reasoning after a loop, recognizing naming semantics = A UB

Cooperation Between Human and Machine • Human-AssistedComputing • Driver: toolHelper: human • Ex. Covana [Xiao et al. ICSE 2011] • Human-CentricComputing • Driver: human Helper: tool • Ex. Coding duels @Pex for Fun Interfaces are important. Contents are important too!

Human-Assisted Computing • Motivation • Tools are often not powerful enough • Human is good at some aspects that tools are not • What difficulties does the tool face? • How to communicate info to the user to get help? • How does the user help the tool based on the info? Iterations to form Feedback Loop

Difficulties Faced by Automated-Structural-Test-Generation Tools external-method call problems (EMCP) object-creation problems (OCP)

Existing Solution of Problem Identification • Existing solution • identify all executed external-method calls • report all object types of program inputs and fields • Limitations • the number is often high • some identified problem are irrelevant for achieving higher structural coverage

DSE Challenges - Preliminary Study Reported EMCPs: 44 Reported OCPs: 18 vs. Real EMCPs: 0 Real OCPs: 5

Proposed Approach: Covana [Xiao et al. ICSE 11] • Goal: Precisely identify problems faced by tools when achieving structural coverage • Insight: Partially-Covered Statements have data dependency on real problem candidates Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011

Overview of Covana Problem Candidate Identification Program Generated Test Inputs Runtime Events Forward Symbolic Execution Problem Candidates Coverage Runtime Information Data Dependence Analysis Identified Problems

Problem Candidate Identification • External-method calls whose arguments have data dependencies on program inputs Data Dependencies

Data Dependence Analysis Symbolic Expression: return(File.Exists) == true Element of EMCP Candidate: return(File.Exists) • Partially-covered branch statements have data dependencies on EMCP candidates for return values Branch Statement Line 1 has data dependency on File.Exists at Line 1

Evaluation – Subjects and Setup • Subjects: • xUnit: unit testing framework for .NET • 223 classes and interfaces with 11.4 KLOC • QuickGraph: C# graph library • 165 classes and interfaces with 8.3 KLOC • Evaluation setup: • Apply Pex to generate tests for program under test • Feed the program and generated tests to Covana • Compare existing solution and Covana

Evaluation – Research Questions • RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs? • RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs?

Evaluations - RQ1: Problem Identification • Covana identifies • 43 EMCPs with only 1 false positive and 2 false negatives • 155 OCPs with 20 false positives and 30 false negatives.

Evaluation –RQ2: Irrelevant-Problem-Candidate Pruning • Covana prunes • 97%(1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives • 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives

Cooperation Between Human and Machine • Human-AssistedComputing • Driver: tool Helper: human • Ex. Covana [Xiao et al. ICSE 2011] • Human-CentricComputing • Driver: human  Helper: tool • Ex. Coding duels @Pex for Fun Interfaces are important. Contents are important too!

Microsoft Research Pex for FunTeaching and Learning CS via Social Gaming www.pexforfun.com 1,126,136clicked 'Ask Pex!' The contributed concept of Coding Duel games as major game type of Pex for Fun since Summer 2010 N. Tillmann, J. De Halleux, T. Xie, S. Gulwani and J. Bishop. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proc. ICSE 2013, Software Engineering Education (SEE), 2013. 39

Behind the Scene of Pex for Fun behavior Secret Impl== Player Impl Player Implementation class Player { public static int Puzzle(int x) { return x; } } Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } }

Human-Centric Computing • Coding duels at http://www.pexforfun.com/ • Brain exercising/learning while having fun • Fun: iterative, adaptive/personalized, w/ win criterion • Abstraction/generalization, debugging, problem solving Brain exercising

Coding Duel Competition@ICSE 2011 http://pexforfun.com/icse2011

Coding Duels for Automatic Grading@NCSU CSC 510 Especially valuable in Massive Open Online Courses (MOOC) http://pexforfun.com/gradsofteng

Human-Human Cooperation: Pex for Fun (Crowdsourcing) • Everyone can contribute • Coding duels • Duel solutions Internet class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }

Access Control Policy (ACP) • ACP includes rules to control which principalshave access to which resources • A policy rule includes four elements • subject – HCP • action - edit • resource - patient's account • effect - deny ex. “The Health Care Personnel (HCP)does not have the ability to edit the patient's account.”

Objectives NL Functional Requirement NL ACPs • How to ensure correct specification of ACPs? • ACPs may be complex/error-prone to specify • ACPs are often written in natural language (NL) • How to ensure correct enforcement of ACPs? • Gap btw ACPs (domain concepts) and system implementation (programming concepts) • Functional requirements bridge the gap but are often written in NL System Implementation conformance

NCSU/NIST Access Control Policy Test Tool (ACPT) • Model Construction • specify and combine access control (AC) models (e.g., Multi-Level, RBAC ) • Model Verification • verify AC models against given properties • Implementation Testing • test AC implementation with NIST ACTS • XACML Synthesis http://csrc.nist.gov/groups/SNS/acpt/index.html ~130 organizations/users : DISA, DOE Fermi Lab, SAIC, NOAA, Rosssampson Corporation, John Hopkins U, Inventure Enterprises, …

ACP in NL Documents • In practice, ACPs are often written in natural language (NL), especially in legacy systems • Supposed to be written in non-functional requirements (e.g., security requirement) • But often buried inside functional requirements …… Patient MID should be the number assigned when the patient is added to the system and cannot be edited. The HCP does not have the ability to edit the patient's security question and password. ……. ( UC1 of iTrust use cases) ex. http://agile.csc.ncsu.edu/iTrust/wiki/doku.php

Example Extraction of ACPs “The Health Care Personnel (HCP)does not have the ability to edit the patient's account.” Access Control Policy Subject Action Resource Effect HCP deny patient.account edit ACP Extraction

Functional Requirements – Use Cases • Scenario-based functional requirements: • use case: a sequence of action steps, describing • principals access different resources for achieving some functionalities • Resource access information: • subject – patient • action – view • resource – access log ex. The patientviewsaccess log.

Cooperative Testing and Analysis: