Part III: Execution – Based Verification and Validation

Part III: Execution – Based Verification and Validation Katerina Goseva - Popstojanova Lane Department of Computer Science and Electrical Engineering West Virginia University, Morgantown, WV katerina@csee.wvu.edu www.csee.wvu.edu/~katerina

Introduction Definitions, objectives and limitations Testing principles Testing criteria Testing techniques Black box testing White box testing Fault based testing Fault injection Mutation testing Outline

Testing levels Unit testing Integration testing Top-down Bottom-up Sandwich Regression testing Validation testing Acceptance testing Alpha and beta testing Non-functional testing Configuration testing Recovery Testing Security testing Stress testing Performance testing Outline

Configuration testing • Many programs work under wide range of hardware configurations and operating environments • Configuration testing is concerned with checking the program’s compatibility with as many as possible configurations of hardware and system software

Configuration testing steps • Analyze the market • Which devices (printers, video cards, etc.) must the program work with? How can you get them? • Analyze the device • How does it work? How will this affect your testing? Which of its features does the program use? • Analyze the way the software can drive the device • How can you identify a group of devices that share same characteristics • Does this type of device interact with other devices • Test the device with small sample of other devices

Configuration testing steps • Save time • Test only one device per group until you eliminate the errors. Then test each device in the group. • Improve efficiency • Consider automation. Organize the lab effectively. Create precise planning and record keeping. • Share your experience • Organize and share your test results so the next project will plan and test more efficiently

Recovery testing • Many computer based systems must recover from faults and resume processing within a prespecified time • Recovery testing forces the software to fail in a variety of ways and verifies that recovery is properly performed • Recovery that requires human intervention • Automatic recovery

Recovery testing • Systems with automatic recovery must have • Methods for detecting failures and malfunctions • Removal of the failed component • Switchover and initialization of the standby component • Records of system states that must be preserved despite the failure

Security testing • Security testing attempts to establish a sufficient degree of confidence that the system is secure • Associating integrity and availabilitywith respect to authorized actions, together with confidentiality, leads to security • availability - readiness for usage • integrity- data and programs are modified or destroyed only in a specified and authorized manner • confidentiality- sensitive information is not disclosed to unauthorized recipients

Interactive analysis of hypothesis Whiteboard May simulate human attacker or defender Automated Semi-automated Actual human attacker or defender (team) Dynamic interaction between human attacker & defender Interactive Cyberwar Security testing Complexity , Realism

Security testing - Penetration testing • Traditionally security testing is performed using penetration testing • attempt to break into an installed system by exploiting well-known vulnerabilities • The Red Team is a model adversary • Differs from a real adversary • Attempts to limit actual damages • Property destruction, information disclosure, etc. • Discloses all tools, techniques, and methods • Cooperates in the goals of the experiment

Penetration testing - Adversary models

Penetration testing - Why Red Team? • Better identification and understanding of vulnerabilities • Understand adversary adaptation to defenses • Understand adversary response to security response • Evaluate system information assurance

Penetration testing - Limitations • There is no simple procedure to identify the appropriate test cases • Error prediction depends on the testers skills, experience and familiarity with the system • There is no well defined criterion when to stop testing

Security testing - Fault injection • Deliberate insertion of faults into the system to determine its response • Well known in the testing fault-tolerant systems • Secure program is one that tolerates injected faults without any security violation • Capability of • automating testing • quantify the quality of testing

Security testing - Fault injection • Understanding the nature of security faults provides a basis for the application of fault injection • Requires the selection of a fault model • Selection of location for fault injection

Security testing – Fault injection • Simulates security flaws by perturbing the internal states • Source code must be examined by hand for candidate locations for fault injection • Identifies portions of software code that can result in security violations • Simulates security flaws by perturbing the input that the application receives from the environment • Test adequacy criterion • fault coverage • interaction coverage

Stress testing • Stress testing is testing with high workload, to the point where one or more, or all resources are simultaneously saturated • Intention of a stress test is to “break” the system, i.e., to force a crash

Stress testing • Stress testing does the following • Distorts the normal order of processing, especially processing that occurs at different priority levels • Forces the exercise of all system limits, thresholds, or other controls designed to deal with overload conditions • Increases the number of simultaneous actions • Forces race conditions • Depletes resource pools in extraordinary and unthought sequences

Stress testing • Benefits • Faults caught by stress testing tend to be subtle • Faults caught by stress testing are often design flaws that may have implications in many areas • When to stress test • Whenever possible, early and repeatedly • As a part of systems acceptance test

Performance testing • Objectives • Show that the system meets specified performance objectives • Determine the factors in hardware or software that limit system performance • Tune the system • Project the systems future load handling capacity

Performance testing • Objectives can be met by • Analytical modeling • Simulation • Measurements on the real system with simulated or real workload

Performance testing • Performance testing presumes a robust, working, and stable system • Faults that have an impact on the system’s function have been removed • Extreme example - If a fault crashes the system no rational performance testing can be done • Faults that affect performance could range from poor design to poor implementation

Examples of performance failures • NASA delayed the launch of a satellite for eight months because Flight Operations Segment software had unacceptable response times for developing satellite schedules, and poor performance in analyzing satellite status and telemetry data • The Web sites of several online brokerage houses could not scale to meet unusually large number of hits following a stock market dip on October 27, 1997; customers experienced long delays in using the sites.

Performance measures • Performance is an indicator of how well a software system or component meets its requirements for timeliness • Timeliness is measured in terms of response timeor throughput • Response time is the time required to respond to a request; it may be time required for single transaction, or end-to-end time for a user task • Throughput of a system is the number of request that can be processed in some specified time interval

Response time Number of request per unit of time Responsiveness and scalability • Responsiveness is the ability of a system to meet its objectives for response time or throughput • Responsiveness has both objective and subjective component • Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for software functions increases The change from linear to exponential increase is usually due to some resource in the system nearing 100% utilization Resource requirements exceed computer and network capacity

Performance testing • Prerequisites • Clear statement of performance objectives • Workload to drive the experiment • Controlled experimental process or testbed • Instrumentation to gather performance related data • Analytical tools to process and interpret the data

Performance testing • Problems with performance objectives • There is no statement of performance objectives, or a statement is so vague that it cannot be reduced to a quantitative measure • There is a clear quantitative statement of objectives, but it cannot be measured in practice • Excessive resources and effort • Excessive experiment duration • There is a clear quantitative statement of objectives, but the objectives are unachievable at reasonable costs

Performance testing • Performance objectives depend on the domain; acceptable response time could be • A few milliseconds for an antiaircraft missile control • A few tens of millisecond for a nuclear reactor control • A few seconds delay in getting a telephone dial tone • Half a minute to answer DB query

Complications and variations • There is more than one type of workload • Probability distribution for different workloads • Different objective for each type of workload • Example: a response time at 4 messages per second shall be less than 2 seconds, and a response time at 8 messages per second shall be less than 8 • Performance may be intertwined with a quantitative reliability/availability specification • Different workload-response time relations are allowed under different hardware/software failure conditions

Complications and variations • Analysis and measurement under time varying workload • Consider different situations – peak hour, average hour, peak day, etc.

Data collection tools • Data collection tool must be an observer of the system under study • Its activity should not affect significantly the operation of the system being measured, that is, degrade its performance • Acceptable overhead for measurement activities is up to 5% • Implementation approach • Hardware • Software • Hybrid (combination of hardware and software monitors)

Data collection tools: hardware monitors • Detects events within a computer system by sensing predefined signals • Electronic probes can sense the state of hardware components such as registers, memory locations, I/O channels • Advantages • External to the system , do not consume resources • Portable, do not depend on the operating system • Disadvantages • Do not access software related information • Can not be used for performance measurements related to applications or workloads

Data collection tools: software monitors • Set of routines embedded in the software of the system with the aim of recording status and events • Advantages • Can record any information available to programs and operating system • Easy to install and use • Great flexibility • Disadvantages • Introduce overheads - use the same resources they are measuring and may interfere significantly with the system • Dependent on the operating system and / or programming language

Data collection tools: software monitors • System-level measurements – system wide resource usage statistics • Usually provided by software monitors that run on the operating system level • Examples of measures: global CPU utilization, global disk utilization, total number of physical I/O operations, page fault rate, total traffic through a router • Program-level measurements – program related statistics • Examples of measures: elapsed time, CPU time, number of I/O operations per execution, physical memory usage, application traffic (packets per second per application)

Measurement mode • Event mode– information is collected at the occurrence of specific events • Upon detection of an event, the special code calls an appropriate routine that generates a record containing information such as date, time, type of event, and event related data • Information corresponding to the occurrence of events is recorded in buffers, and later transferred to disk • Overheads depend on the events selected, the event rate, and the data collected • Large overheads due to very high event rates are the major shortcomings for event mode measurement tools

Measurement mode • Event mode • Logging • Record all the information concerning an event (start and stop time, system state, register contents, etc.) • Event counts • Count the events over a specified period • Event durations • Accumulate event durations for specified events (dividing by corresponding event counts gives us mean event duration)

Measurement mode • Sampling mode– information is collected at predefined time instants specified at the start of the monitoring session • Sampling is driven by timer interrupts based on a hardware clock • Usually less detailed observation than event mode • Overheads depend on the number of variables measured at each sampling point and the size of the sampling interval • Since we are able to specify both factors, we can also control the overheads of the sampling monitors • There is a tradeoff between the low overhead and high accuracy of the measurement results

Workload generation • The best way for stress and performance testing of a given system is to run the actual workload and measure the results • Alternative – performance benchmarks • Clear performance objective and workloads that are measurable and repeatable • Enable comparative studies of products • Vendors, developers, and users run benchmarks to accurately test new systems, pinpoint performance problems, or assess the impact of modification to a system

Performance benchmarks • Before using benchmark results, one must understand the workload, the system under study, the tests, the measurements, and the results • To avoid pitfalls ask the following questions • Did the system under test have a configuration similar to actual configuration (hardware, network, operating system, software, and workload)? • How representative of my workload are the benchmark tests? • Example, if you are developing new graphical application, transaction processing benchmark results are useless, because the two workloads are very different • What is the benchmark version used? What are the new features included in the latest releases?

Performance benchmarks • Two consortia offer benchmarks that are commonly used for comparing different computer systems • Standard Performance Evaluation Cooperation (SPEC) http://www.spec.org SPEC is an organization of computer industry vendors that develops standardized performance tests, i.e., benchmarks, and publishes reviewed results • Transaction Processing Performance Council (TPC) http://www.tpc.org TPC is a nonprofit organization that defines transaction processing and database benchmarks

Stress and performance testing – QA tasks • Include workload generation as a major budget item • Select workload generation methods; start workload generation development at the same time as software development • Plan software instrumentation in support of performance testing as a part of system design; develop, publish, and discuss embedded software instrumentation as early as possible

Stress and performance testing – QA tasks • Tie down workload statistics and parameters as early as possible in written form • Start stress testing as early as possible; subject the system to stress whenever possible • Include stress test as a part of the formal system acceptance test • Accept no performance criteria that cannot be measured and validated with the allocated personnel, schedule, and budget

Stress and performance testing – QA tasks • Plan performance testing as a specific, recognizable activity to be done on a testbed, and if necessary in the field • Be sure to allow enough time for system to stabilize before attempting field performance testing • Run performance tests intermixed with other system tests to detect faults that are performance related

Part III: Execution – Based Verification and Validation