Discrete Choice Modeling

William Greene Stern School of Business IFS at UCL February 11-13, 2004 Discrete Choice Modeling http://cemmap.ifs.org.uk/resources/files/resources_greene_discrete.shtml

Part 3 Modeling Binary Choice

A Model for Binary Choice • Yes or No decision (Buy/Not buy) • Example, choose to fly or not to fly to a destination when there are alternatives. • Model: Net utility of flying Ufly = +1Cost + 2Time + Income +  Choose to fly if net utility is positive • Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, Ufly > 0, 0 if not.

What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) • Are the attributes “relevant?” • Predicting behavior • Individual • Aggregate • Analyze changes in behavior when • attributes change

Application • 210 Commuters Between Sydney and Melbourne • Available modes = Air, Train, Bus, Car • Observed: • Choice • Attributes: Cost, terminal time, other • Characteristics: Household income • First application: Fly or other

Binary Choice Data Choose Air Gen.Cost Term Time Income 1.0000 86.000 25.000 70.000 .00000 67.000 69.000 60.000 .00000 77.000 64.000 20.000 .00000 69.000 69.000 15.000 .00000 77.000 64.000 30.000 .00000 71.000 64.000 26.000 .00000 58.000 64.000 35.000 .00000 71.000 69.000 12.000 .00000 100.00 64.000 70.000 1.0000 158.00 30.000 50.000 1.0000 136.00 45.000 40.000 1.0000 103.00 30.000 70.000 .00000 77.000 69.000 10.000 1.0000 197.00 45.000 26.000 .00000 129.00 64.000 50.000 .00000 123.00 64.000 70.000

An Econometric Model • Choose to fly iff UFLY> 0 • Ufly = +1Cost + 2Time + Income +  • Ufly> 0   > -(+1Cost + 2Time + Income) • Probability model: For any person observed by the analyst, Prob(fly) = Prob[ > -(+1Cost + 2Time + Income)] • Note the relationship between the unobserved  and the outcome

+1Cost + 2TTime + Income

Econometrics • How to estimate , 1, 2, ? • It’s not regression • The technique of maximum likelihood • Prob[y=1] = Prob[ > -(+1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1] • Requires a model for the probability

Completing the Model: F() • The distribution • Normal: PROBIT, natural for behavior • Logistic: LOGIT, allows “thicker tails” • Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice • Does it matter? • Yes, large difference in estimates • Not much, quantities of interest are more stable.

Estimated Binary Choice Models LOGITPROBITEXTREMEVALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant 1.78458 1.40591 0.438772 0.702406 1.45189 1.34775 GC 0.0214688 3.15342 0.012563 3.41314 0.0177719 3.14153 TTME -0.098467 -5.9612 -0.0477826 -6.65089 -0.0868632 -5.91658 HINC 0.0223234 2.16781 0.0144224 2.51264 0.0176815 2.02876 Log-L -80.9658 -84.0917 -76.5422 Log-L(0) -123.757 -123.757 -123.757

Effect on predicted probability of an increase in income +1Cost + 2Time + (Income+1) ( is positive)

How Well Does the Model Fit? • There is no R squared • “Fit measures” computed from log L • “pseudo R squared = 1 – logL0/logL • Others… - these do not measure fit. • Direct assessment of the effectiveness of the model at predicting the outcome

Fit Measures for Binary Choice • Likelihood Ratio Index • Bounded by 0 and 1 • Rises when the model is expanded • Cramer (and others)

Predicting the Outcome • Predicted probabilities P = F(a + b1Cost + b2Time + cIncome) • Predicting outcomes • Predict y=1 if P is large • Use 0.5 for “large” (more likely than not) • Count successes and failures

Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] 81 .00000 .00000 .0000 -3.3944 .0325 85 .00000 .00000 .0000 -2.1901 .1006 89 1.0000 .00000 1.0000 -2.6766 .0644 93 1.0000 1.0000 .0000 .8113 .6924 97 1.0000 1.0000 .0000 2.6845 .9361 101 1.0000 1.0000 .0000 2.4457 .9202 105 1.0000 .00000 1.0000 -3.2204 .0384 109 1.0000 1.0000 .0000 .0311 .5078 113 .00000 .00000 .0000 -2.1704 .1024 117 .00000 .00000 .0000 -3.3729 .0332 445 .00000 1.0000 -1.0000 .0295 .5074 Note two types of errors and two types of successes.

Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*

ROC Curve • Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted • 450 is no fit. Curvature implies fit. • Area under the curve compares models

Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 151 1 | 152 1 20 38 | 58 ------ ---------- + ----- Total 171 39 | 210

Analyzing Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 is P* .5000. (This table can be computed with any P*.) Predicted ------ -------------------- + ----- Actual 0 1 | Total ------ ----------------------+------- 0 N(a0,p0) N(a0,p1) | N(a0) 1 N(a1,p0) N(a1,p1) | N(a1) ------ ----------------------+ ----- Total N(p0) N(p1) | N

Analyzing Predictions - Success • Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%] • Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%] • Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%] • Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%] • Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]

Analyzing Predictions - Failures • False positive for true negative = %actual 0s predicted as 1s = 100N(a0,p1)/N(a0) % [100(1/152)=0.668%] • False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%] • False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%] • False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%] • False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]

Aggregate Prediction is a Useful Way to Assess the Importance of a Variable Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 145 7 | 152 1 48 10 | 58 ------ ---------- + ----- Total 193 17 | 210 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 151 1 | 152 1 20 38 | 58 ------ ---------- + ----- Total 171 39 | 210 Model fit without TTME Model fit with TTME

Discrete Choice Modeling