Application of Metamorphic Testing to Supervised Classifiers

Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University

Background • Many applications in the field of scientific computing depend on machine learning (ML) algorithms • ML applications often do not have test oracles that indicate whether the output is correct for arbitrary input • Applications without test oracles are called “non-testable programs”

Problem Statement • Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques • However, it is difficult to detect subtle (computational) errors for arbitrary inputs

Testing ML Applications • There has been much research into applying ML techniques to software testing, but not the other way around • Reusable real-world data sets and frameworks are available for checking that an ML algorithm predictswell, but not for checking that an implementation workscorrectly

Observation • If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output • However, it may be possible to know relationships between a set of inputs and the corresponding set of outputs • “Metamorphic Testing” [Chen et al. ’98] is such an approach

Metamorphic Testing • An approach for creating follow-on test cases based on previous test cases • If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) • We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution

Metamorphic Testing without an Oracle • When a test oracle exists, we can know whether f(t(x)) is correct • Because we have an oracle for f(x) • So if f(t(x)) is as expected, then it is correct • When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) • If f(t(x)) is as expected, it is not necessarily correct • However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong

Metamorphic Testing Example • Consider a program that reads a text file of test scores for students in a class, and computes theaverages and the standard deviation of the averages • If we permute the values in the text file, the results should stay the same • If we multiply each score by 10, the final results should all be multiplied by 10 as well • These metamorphic properties can be used to create a “pseudo-oracle” for the application

Approach • To apply Metamorphic Testing to such ML applications, we first enumerate the metamorphic relations based on the expected behaviors of a given machine learning algorithm • We then utilize these relations to conduct metamorphic testing on the implementation

Verification & Validation • The scope of which metamorphic properties are necessary may differ between various problems in the domain • Properties that are necessary can be used for verification: “Is the implementation of the algorithm correct?” • Other properties can be used for validation: “Is the algorithm appropriate for solving this problem?”

Research Questions • What are the metamorphic properties of supervised ML classification algorithms? • Which can be used for verification? • Which can be used for validation? • Can metamorphic testing detect defects in real-world ML applications?

Machine Learning Fundamentals • Data sets consist of a number of samples, each of which has attributes and a label • In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label • In the second phase, the model is applied to a previously-unseen data set (“testing” data) with unknown labels to produce a classification of each sample

Algorithms Investigated • k-Nearest Neighbors (kNN) • Samples in the testing data are classified by using Euclidean distance to find the k nearest samples in the training data • Classification is then done by majority rule • Naïve Bayes Classifier (NBC) • For a given sample in the testing data, computes the probability of that sample belonging to each class, assuming conditional independence between the attributes • Chooses the class that is most likely

Metamorphic Relations • We identified 11 properties that we would expect all classification algorithms to have • Affine transformation of attributes • Permutation of labels or attributes • Addition of informative or uninformative attributes • Addition of classes by duplicating or re-labeling samples • Removal of classes or samples

Experimental Setup • Applied the approach to implementations in the Weka 3.5.7 toolkit • Initial test cases: • Randomly generated values • Four attributes (“columns”) • 20-50 samples (“rows”) • Metamorphic relations were applied to create 20-300 follow-on test cases

Results k Nearest Neighbors Naïve Bayes Classifier

Analysis: kNN • No necessary properties were violated • Issues related to validation: • Labels that are non-existent in the training data have a non-zero chance of being selected in classification • If two labels are equally likely, the “first” one that is listed is chosen

Analysis: Naïve Bayes • Four necessary properties were violated, indicating defects in the implementation • Loss of precision related to use of the “double” datatype in Java • Laplace Accuracy used to determine probabilities; thus, labels that did not appear in training data have non-zero probability

Suggestions • We suggest using the “BigDecimal” class instead of the “double” datatype • Laplace Accuracy is appropriate for the attributes but not for the labels • Use of Laplace Accuracy should be set as an option

Future Work • Apply the testing approach to other domains that depend on ML, such as scientific computing • Further investigation of testing “non-testable programs” • Measure the effectiveness of the approach in empirical studies

Summary • Metamorphic testing is easy to implement and automate • We were able to devise fault-revealing properties even with just a basic understanding of the ML algorithms • Metamorphic testing can be used for both verification and validation

Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University

Related Work • Applying MT to non-testable programs in other domains • General properties for use in MT

Application of Metamorphic Testing to Supervised Classifiers

Application of Metamorphic Testing to Supervised Classifiers

Presentation Transcript

Metamorphic Testing

Classifiers

Classifiers

Metamorphic

Functions of Classifiers

Boosting of classifiers

Testing of component-based application

Classifiers

PERFORMANCE TESTING OF WEB APPLICATION

Classifiers

Application of Hypothesis Testing to Multiplicative Heteroscedasticity

Introduction to Mobile Application Testing

Application testing fundamentals of software testing

Classifiers!!!

Metamorphic Testing

FAUST Classifiers FAUST = Fast, Analytic, Unsupervised and Supervised Technology

Classifiers

Importance of Enterprise Application Testing

Application Testing Outsourcing | Application testing solutions