Chapter 4: Predictive Modeling

Chapter 4: Predictive Modeling

Objectives • Explain the concepts of predictive modeling. • Illustrate the modeling essentials of a predictive model. • Explain the importance of data partitioning.

Catalog Case Study Analysis Goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future. Data set: CATALOG2010 Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

Where You’ve Been, Where You’re Going… • With basic descriptive modeling techniques (RFM), you identified customers who might be profitable. • Sophisticated predictive modeling techniques can produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more. • Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.

Descriptive Modeling Tells You about Now Past Behavior Fact-Based Reports Current State of the Customer Descriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past.

From Descriptive to Predictive Modeling Past Behavior Fact-Based Predictions Strategy Predictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future.

Predictive Modeling Terminology Training Data Set inputs The variables are called inputs and targets. target The observations in a training data set are known as training cases.

Predictive Model Training Data Set inputs Predictive model: a concise representation of the input and target association target

Predictive Model inputs Predictions: output of the predictive model given a set of input measurements predictions

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity.

Three Prediction Types decisions inputs rankings estimates prediction

Decision Predictions inputs prediction A predictive model uses input measurements to make the best decision for each case. primary secondary tertiary primary secondary

Ranking Predictions inputs A predictive model uses input measurements to optimally rank each case. 720 520 580 470 630 prediction

Estimate Predictions inputs A predictive model uses input measurements to optimally estimate the target value. 0.65 0.33 0.54 0.28 0.75 prediction

Idea Exchange • Think of two or three business problems that would require each of the three types of prediction. • What would require a decision? How would you obtain information to help you in making a decision based on a model score? • What would require a ranking? How would you use this ranking information? • What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?

Modeling Essentials – Predict Review Decide, rank,and estimate. Determine type of prediction. Select useful inputs. Optimize complexity.

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity.

0.70 0.60 0.50 0.40 Input Reduction Strategies Redundancy Irrelevancy x2 x4 x1 x3

0.70 0.60 0.50 0.40 Input Reduction – Redundancy Redundancy Irrelevancy x2 x4 Input x2 has the same information as input x1. x1 x3 Example: x1 is household income and x2 is home value.

0.70 0.60 0.50 0.40 Input Reduction – Irrelevancy Redundancy Irrelevancy x4 x2 Predictions change with input x4 but much less with input x3. x3 x1 Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.

Modeling Essentials – Select Review Decide, rank, and estimate. Determine type of prediction. Eradicateredundanciesand irrelevancies. Select useful inputs. Optimize complexity.

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity Optimize complexity.

Data Partitioning Training Data Validation Data inputs inputs Partition available data into training and validation sets. target target The model is fit on the training data set, and model performance is evaluated on the validation data set.

Predictive Model Sequence Training Data Validation Data inputs inputs target target 5 4 3 1 2 Create a sequence of models with increasing complexity. Model Complexity

Model Performance Assessment Training Data Validation Data inputs inputs target target Rate model performance using validation data. 5 4 3 1 2 Validation Assessment Model Complexity

Model Selection Training Data Validation Data inputs inputs target target Select the simplest model with the highest validation assessment. 5 4 1 2 3 Validation Assessment Model Complexity

4.01 Multiple Choice Poll • The best model is the • simplest model with the best performance on the training data. • simplest model with the best performance on the validation data. • most complex model with the best performance on the training data. • most complex model with the best performance on the validation data.

4.01 Multiple Choice Poll – Correct Answer • The best model is the • simplest model with the best performance on the training data. • simplest model with the best performance on the validation data. • most complex model with the best performance on the training data. • most complex model with the best performance on the validation data.

Modeling Essentials – Optimize Review Decide, rank,and estimate. Determine type of prediction. Eradicateredundanciesand irrelevancies. Select useful inputs. Tune models with validation data. Optimize complexity.

Chapter 4: Predictive Modeling

Objectives • Explain the concept of decision trees. • Illustrate the modeling essentials of decision trees. • Construct a decision tree predictive model in SAS Enterprise Miner.

Modeling Essentials – Decision Trees Determine type of prediction. Prediction rules Split search Pruning Select useful inputs. Optimize complexity.

Simple Prediction Illustration Training Data 1.0 0.9 Predict dot color for each x1 and x2. 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

40% Decision Tree Prediction Rules root node 1.0 x2 0.9 <0.63 ≥0.63 0.8 0.7 interior node 0.6 x1 x1 x2 <0.52 ≥0.52 <0.51 ≥0.51 0.5 0.4 0.3 0.2 70% 0.1 55% leaf node 0.0 60% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

40% Decision Tree Prediction Rules Predict: root node 1.0 x2 0.9 <0.63 ≥0.63 0.8 0.7 interior node 0.6 x1 x1 x2 <0.52 ≥0.52 <0.51 ≥0.51 0.5 0.4 0.3 0.2 70% 0.1 55% leaf node 0.0 60% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

40% Decision Tree Prediction Rules Decision = Estimate = 0.70 Predict: 1.0 x2 0.9 <0.63 <0.63 ≥0.63 0.8 0.7 0.6 x1 x1 x1 x2 <0.52 ≥0.52 ≥0.51 <0.52 ≥0.52 <0.51 ≥0.51 0.5 0.4 0.3 0.2 70% 40% 0.1 55% 55% 0.0 60% 60% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Modeling Essentials – Decision Trees Prediction rules Determine type of prediction. Split search Split search Select useful inputs Select useful inputs. Pruning Optimize complexity.

Decision Tree Split Search right left 1.0 0.9 0.8 0.7 0.6 Classification Matrix x2 0.5 0.4 Calculate the logworth of every partition on input x1. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 0.52 47% 53% 42% 58% 1.0 max logworth(x1) 0.95 0.9 0.8 0.7 0.6 x2 0.5 0.4 Select the partition with the maximum logworth. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 53% 42% 1.0 max logworth(x1) 0.95 0.9 0.8 47% 58% 0.7 0.6 x2 0.5 0.4 0.3 Repeat for input x2. 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 53% 42% 46% 54% 35% 65% 1.0 max logworth(x1) 0.95 0.9 0.8 47% 58% 0.7 0.63 0.6 x2 0.5 top bottom 0.4 0.3 max logworth(x2) 4.92 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 58% 54% 46% 42% 65% 53% 47% 35% 1.0 max logworth(x1) 0.95 0.9 0.8 0.7 0.6 Compare partition logworth ratings. x2 0.5 top bottom 0.4 0.3 max logworth(x2) 4.92 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 x2 0.8 <0.63 ≥0.63 0.7 0.63 0.6 x2 0.5 0.4 Create a partition rule from the best partition across all inputs. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 x2 0.8 <0.63 ≥0.63 0.7 0.6 x2 0.5 0.4 Repeat the process in each subset. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 39% 61% 55% 45% 1.0 max logworth(x1) 5.72 0.9 0.8 0.7 0.52 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 61% 55% 62% 38% 55% 45% 1.0 max logworth(x1) 5.72 0.9 0.8 39% 45% 0.7 0.6 x2 0.5 top bottom 0.4 0.3 max logworth(x2) -2.01 0.2 0.1 0.0 0.02 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 45% 38% 62% 55% 45% 61% 39% 55% 1.0 max logworth(x1) 5.72 0.9 0.8 0.7 0.6 x2 0.5 top bottom 0.4 0.3 max logworth(x2) -2.01 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search right left 39% 61% 55% 45% 1.0 max logworth(x1) 5.72 0.9 0.8 0.7 0.52 0.6 x2 0.5 top bottom 0.4 38% 55% 0.3 max logworth(x2) -2.01 0.2 62% 45% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Chapter 4: Predictive Modeling