Software quality assurance: a broader perspective

Software quality assurance: a broader perspective • Testing is not the only aspect to software quality assurance • Some terminology • Validation -- assuring that the software corresponds to specifications • Are we building the right product? • Verification -- proving that the software behaves correctly • Are we building the product right? • Usually a form of validation • Inspections and testing are examples of validation • Program verification is an example of validation • The general goal is not to let faults to become defects • If a fault appears in released software, it is called defect

What is software quality? • The answer depends on who you are • A user • Functionality - functional requirements must be implemented correctly • Portability - the system can be used with different hardware and OSes • Performance - performance requirements must be satisfied • Efficiency - the system uses the hardware and system resources efficiently • Reliability - low defect-count • Robustness - high resistance to user mistakes • A developer • Understandability - design and implementation are easy to understand and well-documented • Testability - easy to test • Modifiability - easy to modify

Cost of software faults and defects • Cost to the user • Unavailable functionality • Unsatisfactory performance • Depending on the nature of the system, financial loss or lost earnings • Cost to the developer/maintainer • Processing bug reports • Bug fixes or re-engineering • Reinstall support • Potential lost future business

How easy is it to remove a fault or defect? • Depends on where the fault occurs • The earlier in the lifecycle a fault is introduced, the more difficult it is to fix • Depends on the stage on which the fault is noticed • The later in the lifecycle the fault is noticed, the more difficult it is to find and fix • Fixing a fault on the design stage is 3-6 times more expensive than fixing a fault on the requirements stage • Fixing a defect is 40-1000 times more expensive than fixing a fault on the requirements stage • Fixing defects costs more than fixing faults

What does software quality assurance involve • Testing • Software inspections and reviews • Requirements specifications • Design specifications • Code • Test suites • Documentation • Formal verification • Light-weight automated validation and verification techniques • Next lecture • Process improvements based on the project experience

Correctness • a product is functionally correct if it satisfies all the functional requirement specifications • correctness is a mathematical property • requires a specification of intent • a product is behaviorly correct if it satisfies all the specified behavioral requirements • difficult to prove poorly-quantified qualities such as user-friendly

Reliability • measures the dependability of a product • the probability that a product will perform as expected • sometimes stated as a property of timee.g., mean time to failure • Reliability vs. Correctness • reliability is relative, while functional correctness is absolute • given a "correct" specification, a correct product is reliable, but not necessarily vice versa

Robustness • behaves "reasonably" even in circumstances that were not expected • making a system robust more then doubles development costs • a system that is correct may not be robust, and vice versa

Why do software inspections? • Code (analysis document, design document) reads --- they are so boring. Isn’t testing enough? • No. • Although testing plans have to be created throughout software development, testing cannot start until implementation • A test that exhibits a failure has to be investigated  labor-intensive debugging process • Faults can be identified and eliminated early in development through informal code inspections • Several studies reported that code reviews can be cheaper and more effective than testing

Reviews, Inspections, and Walkthroughs • Formal reviews • author or one reviewer leads a presentation of the product • review is driven by presentation, issues raised • Walkthroughs • usually informal reviews of source code • step-by-step, line-by-line review • Inspections • list of criteria drive review • properties not limited to error correction

Review methods • Fagan inspections • formal, multi-stage process • significant background & preparation • led by moderator • Active design reviews • also called "phased inspections" • several brief reviews rather than one large review • Cleanroom • formal review process • Plus, statistical based testing

Fagan Inspections • 3-5 participants • 5 stage process with significant preparation

Fagan Inspections participants (3 to 5 people) • MODERATOR - responsible for organizing, scheduling, distributing materials, and leading the session • AUTHOR - responsible for explaining the product • SCRIBE - responsible for recording bugs found • PLANNER or DESIGNER - author from a previous step in the software lifecycle • USER REPRESENTATIVE - to relate the product to what the user wants • PEERS OF THE AUTHOR - perhaps more experienced, perhaps less • APPRENTICE - an observer who is there mostly to learn

Fagan Inspection Process • Planning • done by author(s) • Prepare documents and an overview • explain content to the inspectors • done by moderator • Gather materials and insure that they meet entry criteria • Arrange for participants • assign them roles • insure their training • Arrange meeting

Fagan Inspection Process (cont.) • Preparation • Participants study material • Inspection • Find/Report faults (Do not discuss alternative solutions) • Rework • Author fixes all faults • Follow-Up • Team certifies faults fixed and no new faults introduced

Fagan Inspection-general guidelines • Distribute material ahead of time • Use a written checklist of what should be considered • e.g., functional testing guidelines • Criticize product, not the author

People Resource versus Schedule PEOPLE WITHOUT INSPECTIONS R WITH E P INSPECTIONS Q L U A I N R N E I M N E G N T S CODING TESTING SHIP DESIGN SCHEDULE

Some Experimental Results • Using software inspections has repeatedly been shown to be cost effective • Increases front-end costs • ~15% increase to development cost • Decreases overall cost • Productivity numbers for the Fagan method • Number of source code statements that can be covered per hour of overview: • ~500 • Number of source code statements participants can read through per hour of preparation: • ~125 • Number of source code statements that can be inspected per hour of meeting • ~90-125

An IBM study • doubled number of lines of code produced per person • some of this due to inspection process • reduced faults by 2/3 • found 60-90% of the faults • found faults close to when they are introduced • helps reduce cost

Why are inspections effective? • knowing the product will be scrutinized causes developers to produce a better product • having others scrutinize a product increases the probability that faults will be found • walkthroughs and reviews are not as formal as inspections, but appear to also be effective • hard to get empirical results

What are the deficiencies? • focus on error detection • what about other "ilities" -- usability, portability, etc? • not applied consistently/rigorously • inspection shows statistical improvement • but cannot ensure quality • human intensive and often makes ineffective use of human resources • e.g., skilled software engineer reviewing coding standards, spelling, etc. • No automated support

Experimental Evaluation • There have been many studies that have demonstrated the effectiveness of inspections • Recent studies trying to determine what aspects of inspections are effective • Provide insight into • Ways to improve the process • Ways to reduce the cost • E.g., Understanding the Sources of Variation in Software Inspections, Adam A. Porter, Harvey Siy, Audris Mockus, Lawrence G. Votta

The Lucent study • Lucent compiler project for 5ESS telephone switching system • 55K new lines; 10K reused lines • 6 developers; 5 other professionals • At least 5 yrs. experience • Inspection training • Modify process • Measure effect on number of defects • Measure effect on inspection interval • E.g. start of inspection to end • People effort

Hypotheses • Large teams ==> • No increase in defects found • Increase interval • Multiple-session inspections ==> • Increase in defects found • Increase interval • Correcting defects between sessions ==> • Increase in defects found • Increase interval • Terminated this process early since it was too costly

Results from the experiment • Team size did not impact effectiveness • Can use a small team w/o jeopardizing the effectiveness • Number of sessions did not impact effectiveness • Can use one session • Repairs between sessions did not improve defect detection but did increase interval Use single sessions inspections with small teams

Static Analysis • Attempt to determine information about a system without executing it • test data independent • With testing, we only know how a program works on the executed test data • With static analysis, we know certainfacts about the system (for all test data!) • Generally, refers to automated analysis methods

Major static analysis approaches • Dependence analysis • Data Flow Analysis • Symbolic Execution • Formal verification • Static concurrency analysis • Reachability analysis • Flow equations • Data flow analysis

Some types of information that can be computed automatically • Unreachable code • Unused variable definitions • Uninitialized variables • Constant variable values • Pointer aliasing • Side-effects!

Why do many software engineers claim that C sucks? • Because it leaves the programmers with too much freedom to make mistakes • No array bound checks • Pointer arithmetic allowed • No type checks • … • Later-generation programming languages restrict many “dangerous” things on the level of compilation • What do I do if I have to use C? • Use LINT • Static analyzer that does “sanity checks”

How can we extend quality assurance into maintenance? • Capture and replay • E.g. JRapture tool • A. Podgurski and J. Steven, Case Western University • In-field executions of Java programs are captured and saved, so that error traces are preserved • Has to be very efficient not to incur significant overheads!

SQA group • SQA stands for Software Quality Assurance • May include developers or be independent from project development activities • An SQA group is responsible for quality assurance activities: • Planning - the target level of quality, evaluations, and reviews to be done, procedures for error reporting and tracking, documentation to be produced, and type of feedback • Oversight - make sure that the process and the plan include adequate quality assurance activities and that these activities are followed during the project • Analysis - review the ongoing quality assurance activities, possibly take measurements • Reporting and record keeping - record quality concerns and deviations from the plan, report to upper management

Decision Maker Programmer Tester Software Development Today Why do we have this structure?

Decision Maker Programmer Tester Typical Scenario (1) “OK, calm down. We’ll slip the schedule. Try again.” “It doesn’t compile, @#$% it!” “I’m done.”

Decision Maker Programmer Tester Typical Scenario (2) “Now remember, we’re all in this together. Try again.” “It doesn’t install!” “I’m done.”

Decision Maker Programmer Tester Typical Scenario (3) “Let’s have a meeting to straighten out the spec.” “It does the wrong thing in half the tests.” “I’m done.” “No, half of your tests are wrong!”

Decision Maker Programmer Tester Typical Scenario (4) “Try again, but please hurry up!” “It still fails some tests we agreed on.” “I’m done.”

Decision Maker Programmer Tester Typical Scenario (5) “Oops, the world has changed. Here’s the new spec.” “Yes, it’s done!” “I’m done.”

Cleanroom: S/W development process • Mills, Harlan D., Michael Dyer, and Richard C. Linger • Originally proposed by H. Mills in the early 80’s • H. Mills had previously proposed the chief programmer team concept

Major contributions • Incremental development plan • Instead of a pure waterfall model • Incrementally develop subsystems • Use formal models during specification and design • Structured specifications • State machine models • Developers use informal verification instead of testing • Independent, statistical based testing • Based on usage scenarios derived from state machine models

Cleanroom

Black box • stimulus history -> response • The black box is a functional mapping of all possible stimulus histories to all possible responses. • The black box mapping may be expressed in symbolic notation or in the natural language of the problem domain. • S = the set of all possible system stimuli • S* = the set of all possible stimulus histories • R = the set of all possible system responses • BB: S* -> R

State Box • current stimulus, old state -> response, new state • state is the encapsulation of stimulus history • one identifies the stimuli that need to be saved, and invents state variables to hold them, whereas the black box is represented as the transition • s(current), old state -> response, new state. • S = the set of all possible system stimuli • T = the set of all state data • R = the set of all possible system responses • SB: S x T -> R x T

Clear Box • The clear box is the full procedural design, specifying both data flow and control flow. New black boxes may be created to encapsulate lower-level functions. • CB: S x T -> R x T, with implementation of state update and response production.

Verification • ensure that a software design is a correct implementation of its specification • team verification of correctness takes the place of individual unit testing • benefits • intellectual control of the process • motivates developers to deliver error-free code • verification is a form of peer review • each person assumes responsibility for and derives a sense of ownership in the evolving product • every person must agree that the work is correct before it is accepted -> successes are ultimately team successes, and failures are team failures

Verification • team applies a set of correctness questions • correctness is established by group consensus if it is obvious • by formal proof techniques if it is not.

Cleanroom

Usage specification

Software quality assurance: a broader perspective