Verification and Validation

Verification and Validation John Morris Computer Science/Electrical and Computer Engineering A hard day’s work ensuring that some Japanese colleagues understand why Auckland is The City of Sails!

Terms • Validation • Ensuring that the specification is correct • Determine that the software to be built is actually what the user wants! • Verification • Ensuring that the software runs correctly

Validation or Verification? Validation Building the right software Make sure it’s what the user wants Verification Building the software right Make sure it works Accurate, complete specification essential!

Specifications • Functional • Define actions and operations of system eg • Each transaction shall be stored in a database • GST at the current rate is applied to invoice • Can be verified by software tests • Apply an input data set • Compare output state to expected state • Expected state is defined in specifications

Specifications • Functional • Define actions and operations of system • Can be verified by software tests • Non-functional • Performance eg • Searches will take <2 seconds • Messages will be compressed by 60% • Usability eg • An trained monkey shall be able to run this software • Require special tests

Testing • Aim • Locate and repair defects • Axiom • No matter how much testing you do, you can’t be sure that there isn’t an error waiting to bite you! • Testing only reveals the presence of defects,it never proves their absence!!

Testing The alternative? • Formal verification • Uses formal logic to prove that software is correct • Currently: • Prohibitively expensive • Little automated support • Mainly manual techniques • Error prone • Only feasible when cost of failure is extreme • Usually when failure leads to loss of life • Air and space craft control • Medical systems • Nuclear plants

Testing - Motivation Definitely the least glamorous part of software development  • Possibly the most expensive! • If not carried out thoroughly! • Estimates of the economic cost of software failure produce astronomic numbers • US: $59.5 billion in 2002 • http://www.nist.gov/public_affairs/releases/n02-10.htm • ~10% of projects are abandoned entirely • Including some very large ones

Famous software failures • July 28, 1962 Mariner I space probe • A bug in the flight software for the Mariner 1 causes the rocket to divert from its intended path on launch. Mission control destroys the rocket over the Atlantic Ocean. The investigation into the accident discovers that a formula written on paper in pencil was improperly transcribed into computer code, causing the computer to miscalculate the rocket's trajectory.

Famous software failures • 1982 -- Soviet gas pipeline. • Operatives working for the Central Intelligence Agency allegedly plant a bug in a Canadian computer system purchased to control the trans-Siberian gas pipeline. The Soviets had obtained the system as part of a wide-ranging effort to covertly purchase or steal sensitive U.S. technology. The CIA reportedly found out about the program and decided to make it backfire with equipment that would pass Soviet inspection and then fail once in operation. The resulting event is reportedly the largest non-nuclear explosion in the planet's history.

Famous software failures • 1985-1987 -- Therac-25 medical accelerator • Based upon a previous design, the Therac-25 was an "improved" therapy system that could deliver two different kinds of radiation: either a low-power electron beam or X-rays. The Therac-25's X-rays were generated by smashing high-power electrons into a metal target positioned between the electron gun and the patient. A second "improvement" was the replacement of the older Therac-20's electromechanical safety interlocks with software control, a decision made because software was perceived to be more reliable. • What engineers didn't know was that both the 20 and the 25 were built upon an operating system that had been kludged together by a programmer with no formal training. Because of a subtle bug called a "race condition," a quick-fingered typist could accidentally configure the Therac-25 so the electron beam would fire in high-power mode but with the metal X-ray target out of position. At least five patients die; others are seriously injured.

Famous software failures • June 4, 1996 -- Ariane 5 Flight 501 • Working code for the Ariane 4 rocket is reused in the Ariane 5, but the Ariane 5's faster engines trigger a bug in an arithmetic routine inside the rocket's flight computer. The error is in the code that converts a 64-bit floating-point number to a 16-bit signed integer. The faster engines cause the 64-bit numbers to be larger in the Ariane 5 than in the Ariane 4, triggering an overflow condition that results in the flight computer crashing. • First Flight 501's backup computer crashes, followed 0.05 seconds later by a crash of the primary computer. As a result of these crashed computers, the rocket's primary processor overpowers the rocket's engines and causes the rocket to disintegrate 40 seconds after launch. • More stories • http://www.wired.com/software/coolapps/news/2005/11/69355 or • ‘Software testing failures’ in Google!

Approach • Coding’s finished • Run a few tests • System passes • Release • Result: Disaster • Inadequate design or poor coding produced many timebombs in the system!

Approach • Coding’s finished • Run a few tests • System passes • Release • Here’s the problem .. • Errors are inevitable (we’re human!) • Testing did not reveal them • Passing a few tests was assumed to mean that the system was error-free • See the first axiom!!

Why testing is hard • Let’s take a trivial example • Test the addition operation on a 32-bit machine c = a + b • How many tests needed?

Why testing is hard • Trivial example • Test the addition operation on a 32-bit machine c = a + b • How many tests needed? • Naïve strategy • But simple and easily understood! • How many values for a? 232 • How many values for b? 232 • Total possible input combinations? 232 x 232 = 264 • Assume: • One addition test/10 instructions = 3x108 test/sec

Why testing is hard (2) • Total possible input combinations? 232 x 232 = 264 • Assume: • 3GHz machine • One addition test/~10 cycles = 3x108 test/sec • Time = 264 / 3x108 = 1.6x1019/3x108 = 0.5x1011 sec = several years!! • Clearly need a smarter technique!!

Testing strategies • Exhaustive testing - Try all possible inputs • Naïve • Simple (easy to implement) • Easy to justify and • Argue for completeness! • Works for very small input sets only!! • For inputs, ai, i= 0, n-1 • If Ai = {ai0,ai1,….,aik-1} is the set of all possible values of ai • and |Ai| = k is the cardinality of Ai • then • Tests required =  |Ai| • Clearly only useful when all |Ai| are small!!

Exhaustive Testing • Inefficient, naive? • Never forget the KISS principle • An automated test system can do a very large numbers of tests in a reasonable time • and do them while you’re designing the next test! • Analysis needed is trivial whereas • Analysis for a more efficient test regime may be quite complex and error-prone • It’s easy to convince someone that an exhaustively tested system is reliable

Efficient testing • Many tests are redundant • In the adder example, most tests are equivalent • They don’t exercise any new part of the underlying circuit! • For example, you might argue that • all additions of +ve numbers without overflow are equivalent • Addition of 0 to a +ve number is the same for all +ve numbers • Similarly for 0 + -ve number etc • This divides the tests into equivalence classes • Only one representative of each class need be tested!

Equivalence Classes • Key concept: Only one representative of each class needs to be tested! • All other tests of inputs in the same equivalence class just repeat the first one! • Dramatic reduction in total number of tests • No loss of ‘coverage’ or satisfaction that tests are complete

Adder example Clearly, we’ve achieved a dramatic reduction in number of required tests! Disclaimer: A more careful analysis would look at the circuitry needed to implement an adder!

Equivalence classes – formal definition • A set of equivalence classes is a partition of a set such that • Each element of the set is a member of exactly one equivalence class • For a set, S, and a set of equivalence classes, Ci • U Ci = S • Ci  Cj =  (null set) unless I = j

Equivalence classes – formal definition • A set of equivalence classes is a partition of a set such that • The elements of an equivalence class, C, are classified by an equivalence relation, ~ • If a  C and b  C , then a ~ b • The equivalence relation is • Reflexive a ~ a • Transitive if a ~ b and b ~ c, then a ~ c • Symmetric if a ~ b then b ~ a • A Representative of each class is an arbitrary member of the class • They’re all ‘equivalent’ – so choose any one!

Equivalence classes – verification • Equivalence relation • In the verification context, the elements of the set are the sets of input values for a function under test eg we are verifying a function f( int a, int b ) • The 2-tuples (1,1), (1,2), (1,3) .. (and many more!)are the elements of the set of all possible inputs for f • The equivalence relation is “behaves the same way under testing” • One common interpretation of this is: • “follows the same path through the code”

Equivalence classes – verification • Equivalence relation • Consider this function int max( int a, int b ){ if( a > b ) return a;else return b; } • There are two paths through this code, so the inputs fall into two classes Those for which a > b and the rest • This implies that we have only two tests to make: • (a=5, b=3) and • (a=4, b=6)

Black Box and White Box Verification • There are two scenarios for developing equivalence classes • Black Box • Specification is available but no code • Equivalence classes are derived from rules in the specification eg admission price: if age < 6, then free if age < 16, then 50% else full price would lead to 3 equivalence classes: age < 6; age  6  age < 16; age  16

Black Box and White Box Verification • Black Box • Specification is available but no code • White Box • Code is available and can be analyzed • Equivalence classes are derived from rules in the specification and the code

White Box Verification • White Box • Equivalence classes are derived from rules in the specification and the code • These are not always the same eg a database stored on a disc • Specification might say, if record exists, then return it Black Box Testing • Two equivalence classes • Record exists and • record does not exist

White Box Verification • White Box • However, the code reveals that an m-way tree (matched to disc block size for efficiency) Is used • Many additional classes • Disc block full • Block split needed • Only one record • Record at start of block • Record in middle of block • Record at end of block • Record in root block • Record in leaf • ….

Generating the Equivalence Classes Specification admission price: if age < 6, then free if age < 16, then 50% else full price would lead to 3 equivalence classes: age < 6; age  6  age < 16; age  16 Choose representatives 3, 9 and 29 (or many other sets) 3 5 6 9 15 16 29

Generating the Equivalence Classes Formally Choose representatives 3, 9 and 29 is sufficient However An experienced tester knows that a very common error is writing < for  or > for  or vice versa So include class limits too! 3 5 6 9 15 16 29

Generating the Equivalence Classes Other special cases • Nulls • Identity under addition: x + 0 = x • Unity • Identity under multiplication: x  1 = x • Range Maxima and Minima • May have (or need!) special code • Illegal values • Should raise exceptions or return errors • Read the specification to determine behaviour! 0 3 5 6 9 15 16 29 -5 -1 1 999

Generating the Equivalence Classes • Illegal values • Should raise exceptions or return errors • Read the specification to determine behaviour! • Particularly important! • Typical commercial code probably has as much code handling illegal or unexpected input as ‘working’ code! • Treat every possible exception as an output! 0 3 5 6 9 15 16 29 -5 -1 1 999

Generating the Equivalence Classes Other special cases • This caused the set of representatives to expand from 3 to 12 • Some are not really needed eg code does process 1 in just the same way as 3 • However, this is a small price to pay for robust software! • The cost of proving that a unity is not needed is more than the cost of testing it! Experienced testers routinely include these special cases! 0 3 5 6 9 15 16 29 -5 -1 1 999

Generating the Equivalence Classes Outputs • Find equivalence classes that cover outputs also! • Same general rules apply as for inputs • One representative of each class plus • Boundaries • Null output eg No items in a report – does the user want a confirming report anyway? • Just one output eg Reports often have header and trailer sections - are these correctly generated for a short (<1 page) report? • Never neglect the null case! • It’s very easy to neglect at specification stage • Required behaviour may be ‘obvious’ • No need to write it down! • It will require coding • Experienced programmers know that it’s a very common source of error!

Coverage in White Box Testing • Black Box testing will not usually cover all the special cases required to test data structures • Often, the functional goals of the specification could be met by one of several data structures • Specification may deliberately not prescribe the data structure used • Allows developers to choose one meeting performance goals • Permits substitution of an alternative with better performance (vs non functional specifications) • Coverage defines the degree to which white box testing covers the code • Measurement of completeness of testing

Coverage in White Box Testing • Usually, at least some white box coverage goals will have been met by executing test cases designed using black-box strategies • How would you know if this were the case or not? • In simple modules, which don’t use internal data structures, black box classes may be adequate • This is not the general case though! • Various coverage criteria exist Every statement at least once Every branch taken in true and false directions Every path through the code

Coverage in White Box Testing • Coverage criteria • Logic coverage • Statement: each statement executed at least once • Branch: each branch traversed (and every entry point taken) at least once • Condition: each condition True at least once and False at least once • Branch/Condition: both Branch and Condition coverage • Compound Condition: all combinationsof condition values at every branch statement covered (and every entry point taken) • Path: all program paths traversed at least once

Pseudocode and Control Flow Graphs input(Y) if (Y<=0) then Y = −Y end while (Y>0) do input(X) Y = Y-1 end “nodes” “edges”

Statement Coverage • Statement Coverage requires that each statement is executed at least once • Simplest form of logic coverage • Also known as Node Coverage • What is the minimum number of test cases required to achieve statement coverage for the program segment given next?

Pseudocode and Control Flow Graphs input(Y) if (Y<=0) then Y = −Y end while (Y>0) do input(X) Y = Y-1 end “nodes” “edges”

Branch coverage • Branch Coverage requires that each branch will have been traversed, and that every program entry point will have been taken, at least once • Also known as Edge Coverage

Branch Coverage – Entry points • Why include “…and that every program entry point will have been taken, at least once.” • Not commonin HLLs (eg Java) now • Common in scripting languages • Any language that allows a goto and a statement label!

Procedure – Module Verification Steps • Obtain precise specification • Should include definitions of exception or illegal input behaviour • For each input of module • Determine equivalence classes (inc special cases) • Choose representatives • Determine expected outputs • Repeat for outputs • Many output equivalence classes are probably covered by input equivalence classes • Build test table

Procedure – Module Verification Steps • Write test program • Tests in tables is usually the best approach • Easily maintained • Test programs need to be retained and run when any change is made • To make sure that something that worked isn’t broken now!! • Tables are easily augmented • When you discover the case that you didn’t test for!

Verification and Validation