1 / 32

330 likes | 496 Vues

Automated Test Data Generation. Maili Markvardt. Outline. Introduction Test data generation problem Black-box approach White-box approach. Introduction. Improtance of testing growing since mission-criticality of the software in our everyday life Software errors are more costly than ever

Télécharger la présentation
## Automated Test Data Generation

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Automated Test Data Generation**Maili Markvardt**Outline**• Introduction • Test data generation problem • Black-box approach • White-box approach**Introduction**• Improtance of testing growing since mission-criticality of the software in our everyday life • Software errors are more costly than ever • Testing can be automated • Test execution automation • Test generation automation • Test data generation automation**Problem: example**• User inputs three sides of a triangle (a, b, c). Which type is the triangle? • Requirements: • IF a<=0 || b<=0 || c<=0 -> input incorrect • IF p*(p-a)*(p-b)*(p-c) < 0 -> sides not forming a triangle • IF a==b || a==c || b ==c -> isoceles • Kui a==b & b==c -> eqilateral • Other -> scalene • What strategy? -> what data?**Input validation automation**• The Concept “side of a triangle” equivalence partitions and boundary values • Normal: ]0; ∞[ • Erroneous: ]- ∞; 0[, missing values • Border values: {0} • For testing the Input validation functionality, pick a random value from each equivalence partition for each side: • P(-1, 2, 3), P(1, -2, 3), P(1, 2, -3) • Same with boundary values • P(0, 1, 2), P(1, 0, 2), P(1, 2, 0) • Input validation with “normal” values • P(1, 2, 3)**What about other requirements?**• If input values are dependent and that affects output, random values can not be used! – we may not be able to find needed values with random generation • if p*(p-a)*(p-b)*(p-c) < 0 -> sides don’t form a triangle • We must use specification-based (Black-box) or program-based (White-box) test data generation**Black-Box approach**• Generating test data from formal specifications (ie. Z-notation) • Classification Tree Method (CTM) • ...**Classification Tree Method4**• Based on equivalence partitions method: input and output properties are divided into equivalence partitions • Equivalence partitions are combined into test cases • The goal: minimal but sufficient amount of test cases 4Dai, Z. R., Deussen, P. H. Automatic Test Data Generation for TTCN-3 Using Classification Tree Method (2005)**CTM**• Equivalence partitions form a tree structure • Input dependencies are not resolved**White-Box approach**• White-box test data generation – based on program structure • Test data generation problem: For program P and path u, find input x S, so that P(x) traverses path u, where S is the set of all input values • Remember: white-box approach is based on program formalisation (graph, FSM, …)**Test data generator structure2**2Edvardsson, J. Contributions to Program- and Specification-based test data generation. (2002). www.ida.liu.se/~joned/papers/joned_lic.pdf**Possible strategies (adequacy criteria)**• Statement coverage • Branch coverage • Condition coverage • Multiple-condition coverage • Path coverage • …**Numerous methods for Constraint generator & Constraint**solver • Symbolic Execution • Actual Execution • Symbolic/Actual Execution hybrid • Simulated Annealing • Iterative Relaxation Technique • Chaining Approach • Genetic Algorithms • MEA-Graph Planning • ...**Symbolic Execution2**• Popular static method for finding path constraints • Path constraints are rewritten using input variables • Not suitable for programs using pointers and arrays • Not suitable for programs using precompiled units read(a,b) c=a+b; d=a-b; e=c*d; if (e>5) {...} a*a – b*b > 5 =>**Actual Execution2**• Program is executed several times • On every execution: check, whether or not the desired path is executed • If desired path is not executed, program is re-executed with slightly modified input values • Program is re-executed until desired path is traversed or user-defined limit (time, execution count) is exceeded • Solves some problems of symbolic execution since values of variables are available**Actual Execution**• For each path condition biobjective function is found: • Fi(x) {<|<=|=} 0 • If Fi(x) {<|<=|=} 0, then current path is executed • F(x)= Σ Fi(x), if branch consists of several conditions • How to minimize objective function so that Fi (x)=0 • In other words, what input values are needed to execute desired path?**Simulated Annealing**• Simulated Annealing – generic probabilistic meta-algorithm for finding good approximation to the global optimum for a given function in a large search space • Analogy from metallurgy: Process of annealing is used for reducing defects in material • Metal is heated: atoms start to move • Metal is cooled down slowly: greater probability that atoms find a “suitable” place**Simulated Annealing**• Goal: minimize the objective function -> desired path is executed • Find a “random” solution for objective function • Compare the solution with current solution of objective function • Decide, whether or not the “random” solution is better than current solution**Simulated Annealing**• If “random” solution gives a better value (closer to 0) for objective function, the “random” value is always chosen (probability is 1) • If “random” solution is not better than current solution – “sometimes” it is chosen – depends of the “temperature” • The value of “temperature” is decreased • In the beginning high “temperature” -> almost every solution is chosen • When temperature is lowered, the probability of choosing worse solution is lowered until it is 0**Simulated Annealing: properties**• Choosing worse solutions in the beginning lowers the probability of getting stuck in a local optimum (drawback of Gradient Descent/Hill Climbing/Greedy algorithms) • It is possible to show, that probability of finding global optimum is almost 1 • Little use in practice, since finding the global optimum with sufficient significance by annealing takes more time than full search of the whole search space**Simulated Annealing**• Parameters for successful simulated annealing: Art rather than science • How to find a “random” solution – how to minimize the count of iterations finding the optimum? • How to determine, whether or not the “worse” solution is picked? • Annealing schedule – from what “temperature” to start and how the “temperature” is lowered?**Genetic algorithms**• Imitates the process of natural selection • Evaluation • Choice • Recombination and mutation • Start with random set of solutions - population • Solutions are evaluated for their fitness – ability to generate good offspring • Chosen (good) solutions are recombined and mutated to generate a new generation of solutions**GA for test data generation5**• Algorithm is driven by control dependency graph of the program • Graph nodes = program statements • Graph edges = control dependencies between program statements • Goal: find data for executing certain node (program sentence) • Node X is post-dominated by node Y, if every directed path from X to the end of the program includes node Y 5Pargas, R., Harrold, J.M., Peck, R.R. Test-Data generation Using genetic Algorithms (1999)**GA for test data generation**• Node Y is control dependent of Y, only if • Exists a directed path from Y to X and all nodes on this path (except X,Y) are post-dominated by Y and • X is not post-dominated by Y • Control dependency predicate path (CDPP)– predicates that must be satisfied on acyclic path from initial node to some other node X**GA for TDG: algorithm**• Solution is set of test data • Start with random set • Evaluate fitness of data • Execute the program with data, mark predicates on executed path • Compare the found set of predicates with CDPP to desired node • The more the found set of data allowed to execute CDPP, the better the data is**GA for TDG**• Best solutions are chosen, recombined and mutated to generate a new generation of solutions • Non-typical application of GA – several possible solutions, depending on the test goal • ie. find data for executing nodes A, C – more than one test may be needed if A and C are exclusive!**GA: example**int i, j, k; 1: read i, j, k; 2: if (i<j) { 3: if (j<k) { 4: i=k; } else { 5: k=i; } } 6: print i, j, k; 5 is test goal, CDGpath {ET, 2T, 3F}**GA: example**• Random population • Fitness f{2, 2, 0, 0} • Probability of choosing pi = fi/Σfj ({0.5,0.5,0,0}) • One solution can be chosen more than once**GA: Näide**• New population {(1, 6, 9), (0, 1, 4), (0, 1, 4), (0, 1, 4)} • Recombination (one-point crossover): n first values form one parent and others from the other parent • N=2: {(1, 6, 4), (0, 1, 9)} • Mutation: Value in random position is replaced with a random number • (0, 6, 4), (5, 1, 4)**Summary**• Numerous methods • Black-Box • White-Box • Choice of methods depends on • knowledge and preference of tester, • Technology of SUT**Viited**• 1Edvardsson, J. A survey on Automatic Test Data Generation. (1999). [WWW] www.ida.liu.se/~joned/papers/class_atdg.pdf • 2Edvardsson, J. Contributions to Program- and Specification-based test data generation. (2002). [WWW] www.ida.liu.se/~joned/papers/joned_lic.pdf • 3Gupta, N., M, Mathur, A., Soffa, M.L. Automated Test Data Generation Using an Iterative Relaxation Method (1999)**Viited**• 4Dai, Z. R., Deussen, P. H. Automatic Test Data Generation for TTCN-3 Using Classification Tree Method (2005) • 5Pargas, R., Harrold, J.M., Peck, R.R. Test-Data generation Using genetic Algorithms (1999)

More Related