Software Fault Tolerance (SWFT) Software Testing

Software Fault Tolerance (SWFT)Software Testing Constantin Sârbu Dept. of Computer Science TU Darmstadt, Germany Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

So far: Verification & Validation Testing Techniques Static vs. Dynamic Black-box vs. White-box Testing of dependable systems with Modeling Fault-injection (FI / SWIFI) Some existing tools for fault injection Last time: Testing (SWIFI) of operating systems WHERE: Error propagation in OSs [Johansson’05] WHAT: Error selection for testing [Johansson’07] WHEN: Injection trigger selection [Johansson’07] Today: Profiling the OS extensions (drivers) State definition State changes at runtime Behavior-driven test prioritization Fault Removal: Software Testing

Reminder: SWIFI • General SW • Manipulate bits in memory locations, registers, buses etc. • Emulation of HW faults • Change text segment of processes • Emulation of SW faults (bugs, defects) • Dynamic: E.g., Op-code switch during operation • Static: Change source code and recompile (a.k.a. mutation) • What is different in Oss? • OS act as a mediator between HW and user SW applications • Kernel mode – low accessibility • A failure of the OS often means failure of the whole system • Often source code not available • Add-on kernel extensions written by other parties than OS producer -> lack of experience • Etc.

OS Robustness Testing Efforts at DEEDS • Our research topics presented today: • “Improving Robustness Testing of COTS OS Extensions” (ISAS’06) • What is the state of an OS (component)? • Efforts to define the state of drivers • How a driver behaves at runtime? • Current research: Test prioritization! • Minimize testing effort based on behavior patterns • Bachelor/Master/Diplom/PhD Theses opportunities! http://www.deeds.informatik.tu-darmstadt.de/aja/

On Robustness Testing of OS Device Drivers (Preliminaries)

Outlook • Introduction • System model • Windows Driver Model (WDM) Structures • Robustness Testing Approach • Benefits

About drivers • Whatis a driver? •  a collection of functions for controlling HW, provided in general by a HW supplier • implemented in general as Dynamic Link Library (DLL, SYS file) • What a driver does? •  Imports other libraries • Exports its own functions (like public methods) • Communicates with HW  Windows drivers !

Different Driver Communication Models Operating system Kernel memory space Y.DLL X.DLL DRIVER.DLL

Why Drivers? • Many OS extensions (70% of Linux code, more than 35 000 different drivers for Win XP) • Extensions responsible of most of OS failures (85% in Win XP) • Extensions reside in kernel address space, interfering with it with almost no restriction • They are written by programmers less experienced in OS kernel architecture • Kernel is the same, set of loaded extensions almost unique within machines

Robustness testing using SWIFI Limitations of previous techniques: - simplified error model (bit-flips, parameter errors) - huge number of test cases • Need a new technique able to: • treat drivers as black-boxes; • use more relevant error models; • increase injection resolution; • reduce testing time; • increase functional coverage.

System & I/O Model Application n “read file f” Application 1 Application n … System Services User space I/O Manager System Services “read(hnd, size, buf)” I/O Manager Other facilities Driver Driver Driver Kernel space “IRP_MJ_READ” Driver Driver Physical Hardware Physical Hardware Hardware “spin HDD plates, move head h, start read…”

WDM (Windows Driver Model) Basics • A framework for device drivers that operate under MS Windows 98/ME/2K/XP and Server 2003 • Designed for forward compatibility across Windows versions • Feature-set partitioning: a WDM driver must implement some standard routines, the rest is application dependent • Provide not only protocols, but tools and documentation (Driver Developer Kit - DDK)

Driver lifecycle Driver not loaded OS loads driver code in virtual memory OS creates an empty DRIVER_OBJECT OS runs DriverEntry function of the driver IRP Driver READY Driver WORKING Status • WDM important entities: • DRIVER_OBJECT • IRP (MJ, MN, IOCTL)

WDM Structures: 1. DRIVER_OBJECT WDM.H: typedef struct _DRIVER_OBJECT { CSHORT Type; CSHORT Size; ... } DRIVER_OBJECT, *PDRIVER_OBJECT;

WDM Structures: 2. IRP & I/O Stack Location IRP dependent Status of an operation I/O Stack location • ~ 28 major function IRPs: • 3 have minor IRPs • 2 have HW specific control codes IRP Structure

FSM Idea Driver not loaded CREATE CLOSE OS loads driver code in virtual memory READ OS creates an empty DRIVER_OBJECT WRITE OS runs DriverEntry function of the driver DEVICE_CONTROL MN IRP MN Driver READY Driver WORKING MN Status • Serial.sys: • 9 MJ • 2 MN (POWER) • 37 (DEVICE_CONTROL) • 4 (INTERNAL_DEVICE_CONTROL) • ------------------------------------------ • 52 “modes”

Current focus • Finding a minimal graph to describe driver functionality • Determining the sub-graph(s) with maximum impact on system’s robustness • Finding a proper fault model(s): sequences, etc. • Is this approach scalable/portable?

Improvements of traditional testing techniques • Benefits: • error model based of actual communication level • good functionality coverage • can be used as an aid for fault-injection techniques • can be used for smart placing of EDMs & ECMs

Serial driver example

Improving Robustness Testingof COTS OS Extensions Constantin Sârbu, Andréas Johansson, Falk Fraikin and Neeraj Suri Department of Computer Science TU Darmstadt, Germany Presented at ISSRE 2006

Outline • System Model and MS Windows Driver Model • Driver Mode and Operational Profile • Coverage Metrics for Testing • Case Study: The Serial Driver (Windows XP)

Why Driver Testing? • OS extensions (drivers) • COTS components enhancing OS’s adaptability • collection of subroutines for controlling HW • reside in kernel space • Strong OS robustness impact • fast developed software  high defect density • ~70% of Linux kernel code, > 35 000 different drivers for Windows XP* • often unrestricted interference with OS • major OS failures cause (~85% of SW related failures in Windows XP*)  Test the drivers better! • * Improving the Reliability of Commodity Operating Systems, M. M. Swift et al., SOSP 2003

Driver Testing • Testing of Drivers (developers and users): crucial but difficult • limited access: located in kernel space • driver users have limited access to source code • set of loaded drivers is different across installations • Common Driver Testing Philosophies: • Microsoft approach: Driver Reliability Signature (DRS) program • DDK (Driver Developer Kit) and HCT (Hardware Compatibility Test) • based on “fault checklists” • tests available to driver developers • SWIFI (SW Implemented Fault Injection) • inject artificial faults and observe outcome • reboot system to inject into the same “state” • not considering driver’s operational state • Multiple at varied design/abstraction levels … (functional, behavioral…)  How good is a testing method?

Microsoft Tests (fault checklists) State Space and Operational Profile Operational Profile (dependent on workload) Driver’s State Space SWIFI Faults (tend to cluster*)  We need testing methods matching the operational profile * An Empirical Investigation of Software Fault Distribution, K. H. Möller and D. Paulish, SMS 1993

System Model • (Currently) MS Windows XP SP2 as case study • The set of applications is known USER SPACE Application p Application 1 … System Services KERNEL SPACE Driver 1 Driver 2 I/O Manager Other OS Facilities Driver 3 Driver m Hardware Layer HW SPACE • Drivers interact with the rest of the system via I/O Manager • Windows Driver Model (WDM) specifies the communication interface between I/O Manager and the drivers

I/O Request (IRP) I/O Request Handler Result & Status Windows Driver Model (WDM) • Unified interface between OS kernel and drivers • I/O Request Packet (IRP) • communication media between I/O Manager and drivers • I/O Manager builds IRP request and pass it to a driver • driver executes associated code and returns the result using the same IRP instance • Each driver • contains a set of procedures, each one executed when a particular request was received • publishes a list with entry points to the respective procedures • A driver can execute several IRP requests concurrently DRIVER

{ 1, if performing the functionality triggered by IRPi P(IRPi) = 0, otherwise IRP2 I/O Request Handler Result & Status Driver Mode • At time t, the mode of a driver is a tuple of predicates, each assigned to one of the n IRPs the driver supports: MD: < P(IRP1) P(IRP2) P(IRP3) … P(IRPn) > Example: a driver supporting 4 distinct IRPs: IRP1 IRP2 IRP3 IRP4 MD: < 0 0 0 0 > 1 DRIVER

# of Active IRPs bidirectional edges 0000 0 0000 1000 0100 0010 0001 1 1000 0100 1000 0100 0010 0001 1100 1010 2 1001 0011 0110 0101 1100 1100 0011 1110 1101 1011 0111 3 1110 1101 1111 4 Driver’s State Space • The driver’s state space is represented by the set of all possible driver modes • The operational profile is defined by the set of visited modes • Total number of modes: N = 2n • Total number of transitions: T = n·N = n·2n Assumption: at any instant of time, only one IRP can be received or finished by the driver

0000 1000 0100 0010 0001 1100 0011 1111 1010 1001 0110 0101 1110 1101 1011 0111 Testing Coverage Metrics • Ideal testing technique should test 100% of the operational profile! 1. Mode Coverage: every visited mode is tested MC = |tested modes ∩ visited| / # of op. profile modes 2. Transition Coverage: for every visited mode, all outgoing traversed transitions are tested TC = |tested transitions ∩ traversed |/ # op. profile transitions 3. Path Coverage: traverse all the paths between two visited modes, over any number of hops

Logs IrpTracker Case Study: The Serial Driver (Windows XP) • Experimental setups: • Pentium4 @2.8Ghz • serial modem (external) • cables (serial, loopback) • various benchmark software • How much of the mode graph is actually visited? Workload App. USER SPACE Serial Driver I/O Manager Communication Party: - 56k Modem - 2nd computer - loopback cable KERNEL SPACE Serial Port HW SPACE • Serial driver: • serial.sys, provided together with Windows XP Professional SP2 • digitally signed by Microsoft • passed the reliability and stress tests included in HCT (Hardware Compatibility Tests) and DDK (Driver Development Kit)

Experiment 1 – Driver-Usage Pattern • Used a commercial modem benchmark as workload • get / set serial port settings • send / receive data • traffic is verified for completeness and correctness • Assumed that: • the generated load is representative for normal operational mode of the driver • the sequence of IRPs is repeatable • Small operational profile: only 7% of modes and 1.8% of transitions were visited -> consistent!

Experiment 2 – Aggregated Workload • Workload: a set of 7 applications that generated a total of: • 107456 requests (10 distinct) • a total of 1024 modes / 10240 transitions • Operational profile: only 1.66% of modes and 0.34% of transitions were visited -> indicate where to focus testing Observations: • some modes are visited much frequently than others • only modes located on first levels are visited (11 levels) • existence of loops!

Discussion • Operational profile • only a very small amount of modes are actually visited under a given workload • it indicates the modes and transitions with high likelihood to be reached in the field  test those preferentially! • not many IRPs were executed concurrently • short IRP sequence to bring the driver in the desired mode • IRP sequences • generating those can be problematic (receipt and return occur non-deterministically) • Wave Testing: first test visited modes, then their one-hop neighbors by trying to traverse new edges • Limitations • cannot deal with parallel processing of several IRPs of the same type • assumes sequential start/finish of IRPs (no jump over one level)

Conclusions & Future Work • Our contribution provides “means” to identify relevant locations for focused/effective testing (& for black-box SW!) • Requires no modifications of the OS or driver source code • Assist the debugging process (we have information about which subroutine is running at a given moment) • Future work • Representative set/classes of drivers, OSs • Build operational graphs complementing MS testing tools (is Microsoft testing enough?) • Application profiling (build behavioral patterns for driver usage)

Logs WDM Driver Communication Party: - 56k Modem - 2nd computer - loopback cable HW Periph. IrpTracker Logs Current Work @ DEEDS • Issues: • IrpTracker dependance • Create a general WDM monitoring approach • Solution: • Filter Driver that writes to a log file Workload App. USER SPACE Serial Driver I/O Manager KERNEL SPACE Serial Port HW SPACE • Offline: • Parse log file • Calculate probabilities to visit a mode or traverse a transition • Use the probabilities to guide testing • Eg.: test the modes mostly visited • Eg.: test the modes least visited (robustness!)

Example Output

Current Work @ DEEDS • Ranking of modes and transitions based on occurrence and temporal properties • Metrics for modes: • MOW (# of current mode sojourns / total mode visits) • MTW (time spent in current mode / total experiment duration) • MCW = λ·MOW + (1- λ) ·MTW • Metrics for transitions: • TOW (# of current transition traversals / total traversals)

Current Work @ DEEDS • Also, the mode change can be visualized • Useful for • Revealing sojourn patterns • Identify IRP sequences leading to a certain mode

SWOSFT Labs • The labs are thought to be a part of the project presented today • Programming inside the kernel is tedious, time consuming and frustrating (meet the BSoD!), though the outcomes are rewarding  • Access to our lab, tools, books, existing code and experience! • Possibility to continue the work on the same line in our group as Bachelor/Master/Diplom/PhD • Porting the Filter from XP (Windows Driver Model) to Vista (Windows Driver foundation) • WDF-Kernel mode drivers is ~ the same as WDM • Possible issues • Kernel Function libraries / APIs not the same • The mechanism to load a Filter onto an existing driver is different • Writing to a file from kernel space • Partly implemented but unstable • Possible issues • Ordering of events in the log file • Make sure no event is lost

Software Fault Tolerance (SWFT) Software Testing

Software Fault Tolerance (SWFT) Software Testing

Presentation Transcript

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault tolerance

Fault tolerance

Byzantine Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance

Fault Tolerance