Business Intelligence Technologies – Data Mining

Business Intelligence Technologies – Data Mining Lecture 6 Neural Networks

Agenda • Artificial Neural Networks (ANN) • Case Discussion • Model Evaluation • Software Demo • Exercise

The Metaphor Input Attribute 1 Sum of weighted input values Transfer function Input Attribute m

What Neural Nets Do • Neural Nets learn complex functions Y=f(X) from data. • The format of the function is not know.

Components of Neural Nets • Neural Nets are composed of • Nodes, and • Arcs • Each arc specifies a weight. • Each node (other than the input nodes) contains a Transfer Function which converts its inputs to outputs. The input to a node is the weighted sum of the inputs from its arcs.

Inside a Node • Each node contains a transfer function which converts the sum of the weighted inputs to an output

o u t p u t y { 1 if input > T 0 otherwise w1 w2 wn . . . x1 x2 xn inputs A Simple NN Here is a simple Neural network with no hidden layers, and with a threshold-based transfer function:

Structure of Neural Nets • Nodes are arranged into: one input layer, zero or more hidden layers, and one output layer. • In the input layer, there is one input node for each attribute • The number of hidden nodes and layers is configurable • In the output layer, there is one output node for each category (class) being predicted, where the output at each node is typically the probability of the item being in that class. For 2-class problems, one output node is sufficient. Neural nets may also predict continuous numeric outputs in which case there is only one output node. Input Layer Hidden Layer Output Layer

Examples: Different Structures

Feed-Forward Neural Net • Feed the example into the net: the value for each attribute of the example goes into the relevant input node of the net • Multiply the value flowing through each arc (x) by the weight (w) on each arc. • Sum the weighted values flowing into each node • Obtain the output (y) from each node by applying the transfer function (f) to the weighted sum of inputs to the node • Continue to feed the output through the net. • This process is known as feed-forward.

Neural Network Training • Training is the process of setting the best weights on the arcs connecting all the nodes in the network • The goal is to use the training set to calculate weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible • Back propagation has been used since the 1980s to adjust the weights (There are also other methods): • Calculates the error by taking the difference between the calculated result and the actual result • The error is fed back through the network and the weights are adjusted to minimize the error

How Nets Learn Its Weights • Training the nets begins with assigning each arc a small random positive weight. • Feed training examples into the net, one by one. (Each training example is marked with the actual output.) • Compute the output for each example by feeding each attribute value into the relevant node, multiplying by the appropriate arc weights, summing weighted inputs, and applying transfer functions to obtain outputs at each level. • Compare the final output of the Nets to the actual output for each example. • Adjust weights by small fraction so as to minimize the error. Error may be the simple squared difference between actual and calculated output, or some other more complex error function. • Continue feeding examples into net and adjusting weights (usually feeding the training set multiple times). See later for how you decide when to stop.

Neural Nets Advantages • Can model highly non-linear and complex spaces accurately • Handles noisy data well. • Trained network works just like a math function, computing outputs is quick and the neural net can therefore be easily embedded into any decision analysis tool. • Incremental: can simply add new examples to continue learning on new data.

Problems with Nets • Over-training  overfitting. Training should be stopped when the accuracy on the test set starts decreasing markedly. • Comprehensibility / Transparency: while you can read mathematical functions from the neural net, comprehensible rules cannot be directly read of a neural net. It is difficult to verify the plausibility of the model produced by the neural net as predictions have low explainability. • Input values need to be numeric (b/c need to mullied by a weight) • Convergence to a solution is not guaranteed and training time can be high. Typically a net will stop training after a set number of iterations over the training set, after a set time, when weights start to converge to fixed values, or when the error rate on test data starts increasing.

Overtraining • If you let a neural net run for too long, it can memorize the training data, meaning it doesn’t generalize well to new data. • The usual resolution to this is to continually test the performance of the net against hold-out data (test data) while it is being trained. Training is stopped when the accuracy on the test set starts decreasing markedly.

Example Applications • Major advantage: • Learn complex functions • Hedge fund pricing models • (Because of NN’s ability to fit • complex functions, it suits • very well for financial • applications) • ALVINN learnt to keep a car on the road by watching people drive • Speech and face recognition

An example: Application for Targeting Decisions Problem “Explaining” customers’ purchase decisions by means of explanatory variables, or predictors Process • Learning - Calibrate a NN model (configuration, weights) using a training sample drawn from a previous campaign • Scoring - Applying the resulting network on a new set of observations, and calculate a “NN score” for each customer • Decision - Typically, the larger the NN score, the “better” the customer. Thus, one can sort out the customers in descending order of their scores, and apply a cutoff point to separate out targets from non-targets

Then the NN would look like …

Case Discussion • Neural Fair Value • What are the inputs and outputs of the neural network? • How is the neural network trained? How is the trained network used for prediction? • Describe the entire process of the Neural Fair Value model. • Why does neural network work for stock selection? Will decision tree, KNN, or traditional regression work? • Can the model be improved by manipulating the time periods used for training? • SOM

Agenda • Artificial Neural Networks (ANN) • Case Discussion • Model Evaluation • Software Demo • Exercise

Actual Vs. Predicted Output

Is measuring accuracy on training data a good performance indicator? • Using the same set of examples for training as well as for evaluation results in an overoptimistic evaluation of model performance. • Need to test performance on data not “seen” by the modeling algorithm. i.e., data that was no used for model building

Data Partition • Randomly partition data into training and test set • Training set – data used to train/build the model. • Estimate parameters (e.g., for a linear regression), build decision tree, build artificial network, etc. • Test set – a set of examples not used for model induction. The model’s performance is evaluated on unseen data. Also referred to as out-of-sampledata. • Generalization Error: Model’s error on the test data. Set of training examples Set of test examples

Training, Validation & Test Sets When training multiple model types out of which one model is selected to be used for prediction Training Set (Build models) Validation Set (Compare models’ performances) Test Set Evaluate chosen model’s error

Model’s Performance EvaluationClassification Models Classification models predict what class an instance (example) belongs to. E.g., good vs. bad credit risk (Credit), Response vs. no response to a direct marketing campaign, etc. Evaluation measure 1: Classification Accuracy Rate Proportion of accurate classifications of examples in test set. E.g., the model predicts the correct class for 70% of test examples.

Classification Accuracy Rate Classification Accuracy Rate: S/N = Proportion examples accurately classified by the model S – number of examples accurately classified by model N – Total number of examples

A Question • Assume a model accurately classifies 90% of instances in the test set • Is it a good model?

Consider the following… • Response rate for a mailing campaign is 1% • We build a classification model to predict whether or not a customer would respond. • The model classification accuracy rate is 99% • How good is our model? 99% do not respond 1% respond

Classification Accuracy Rate • After examining the examples the model misclassified • The model always predicts that a customer would not respond (always recommends not to mail) • The model misclassifies all respondents • Conclusion: Need to examine the type of errors made by the model. Not just the proportion of error.

Confusion Matrix for Classification Accuracy Rate

Evaluating Numerical Prediction Assume we build a model to estimate the $ amount spent on next catalog offer.

Evaluating Numerical Prediction Mean-Squared Error (MSE) • a1,a2,…,an - actual amounts spent • p1,p2,…,pn - predicted amounts • Errori=(pi - ai) • Mean Square Error:[(p1 –a1)2+(p2-a2)2+…+ (pn- an)2]/n • MSE=[(83-80)2+(131.3-140)2+(178-175)2+(166-168)2 +(117-120)2+(198-189)2]=31.86 • Root Mean Squared error =

Evaluating Numerical Prediction Mean Absolute Error Does not assign higher weights to large errors. All sizes of error are weighted equally MAE= MAE=(|83-80|+|131.3-140|+|178-175|+|166-168|+|117-120|+|198-189|)/6=(3+8.9+3+2+3+9)/6=4.8

Exercise • Compare multiple classification models (tree, KNN, ANN) • SAS: HMEQ data set • WEKA: bank data set

Business Intelligence Technologies – Data Mining