Discrete Choice Modeling

# Discrete Choice Modeling

Télécharger la présentation

## Discrete Choice Modeling

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. William Greene Stern School of Business IFS at UCL February 11-13, 2004 Discrete Choice Modeling http://cemmap.ifs.org.uk/resources/files/resources_greene_discrete.shtml

2. Part 3 Modeling Binary Choice

3. A Model for Binary Choice • Yes or No decision (Buy/Not buy) • Example, choose to fly or not to fly to a destination when there are alternatives. • Model: Net utility of flying Ufly = +1Cost + 2Time + Income +  Choose to fly if net utility is positive • Data: X = [1,cost,terminal time] Z = [income] y = 1 if choose fly, Ufly > 0, 0 if not.

4. What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N) • Are the attributes “relevant?” • Predicting behavior • Individual • Aggregate • Analyze changes in behavior when • attributes change

5. Application • 210 Commuters Between Sydney and Melbourne • Available modes = Air, Train, Bus, Car • Observed: • Choice • Attributes: Cost, terminal time, other • Characteristics: Household income • First application: Fly or other

6. Binary Choice Data Choose Air Gen.Cost Term Time Income 1.0000 86.000 25.000 70.000 .00000 67.000 69.000 60.000 .00000 77.000 64.000 20.000 .00000 69.000 69.000 15.000 .00000 77.000 64.000 30.000 .00000 71.000 64.000 26.000 .00000 58.000 64.000 35.000 .00000 71.000 69.000 12.000 .00000 100.00 64.000 70.000 1.0000 158.00 30.000 50.000 1.0000 136.00 45.000 40.000 1.0000 103.00 30.000 70.000 .00000 77.000 69.000 10.000 1.0000 197.00 45.000 26.000 .00000 129.00 64.000 50.000 .00000 123.00 64.000 70.000

7. An Econometric Model • Choose to fly iff UFLY> 0 • Ufly = +1Cost + 2Time + Income +  • Ufly> 0   > -(+1Cost + 2Time + Income) • Probability model: For any person observed by the analyst, Prob(fly) = Prob[ > -(+1Cost + 2Time + Income)] • Note the relationship between the unobserved  and the outcome

8. +1Cost + 2TTime + Income

9. Econometrics • How to estimate , 1, 2, ? • It’s not regression • The technique of maximum likelihood • Prob[y=1] = Prob[ > -(+1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1] • Requires a model for the probability

10. Completing the Model: F() • The distribution • Normal: PROBIT, natural for behavior • Logistic: LOGIT, allows “thicker tails” • Gompertz: EXTREME VALUE, asymmetric, underlies the basic logit model for multiple choice • Does it matter? • Yes, large difference in estimates • Not much, quantities of interest are more stable.

11. Estimated Binary Choice Model +---------------------------------------------+ | Binomial Probit Model | | Maximum Likelihood Estimates | | Model estimated: Jan 20, 2004 at 04:08:11PM.| | Dependent variable MODE | | Weighting variable None | | Number of observations 210 | | Iterations completed 6 | | Log likelihood function -84.09172 | | Restricted log likelihood -123.7570 | | Chi squared 79.33066 | | Degrees of freedom 3 | | Prob[ChiSqd > value] = .0000000 | | Hosmer-Lemeshow chi-squared = 46.96547 | | P-value= .00000 with deg.fr. = 8 | +---------------------------------------------+ +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant .43877183 .62467004 .702 .4824 GC .01256304 .00368079 3.413 .0006 102.647619 TTME -.04778261 .00718440 -6.651 .0000 61.0095238 HINC .01442242 .00573994 2.513 .0120 34.5476190

12. Estimated Binary Choice Models LOGITPROBITEXTREMEVALUE Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio Constant 1.78458 1.40591 0.438772 0.702406 1.45189 1.34775 GC 0.0214688 3.15342 0.012563 3.41314 0.0177719 3.14153 TTME -0.098467 -5.9612 -0.0477826 -6.65089 -0.0868632 -5.91658 HINC 0.0223234 2.16781 0.0144224 2.51264 0.0176815 2.02876 Log-L -80.9658 -84.0917 -76.5422 Log-L(0) -123.757 -123.757 -123.757

13. Effect on predicted probability of an increase in income +1Cost + 2Time + (Income+1) ( is positive)

14. How Well Does the Model Fit? • There is no R squared • “Fit measures” computed from log L • “pseudo R squared = 1 – logL0/logL • Others… - these do not measure fit. • Direct assessment of the effectiveness of the model at predicting the outcome

15. Fit Measures for Binary Choice • Likelihood Ratio Index • Bounded by 0 and 1 • Rises when the model is expanded • Cramer (and others)

16. Fit Measures for the Logit Model +----------------------------------------+ | Fit Measures for Binomial Choice Model | | Probit model for variable MODE | +----------------------------------------+ | Proportions P0= .723810 P1= .276190 | | N = 210 N0= 152 N1= 58 | | LogL = -84.09172 LogL0 = -123.7570 | | Estrella = 1-(L/L0)^(-2L0/n) = .36583 | +----------------------------------------+ | Efron | McFadden | Ben./Lerman | | .45620 | .32051 | .75897 | | Cramer | Veall/Zim. | Rsqrd_ML | | .40834 | .50682 | .31461 | +----------------------------------------+ | Information Akaike I.C. Schwarz I.C. | | Criteria .83897 189.57187 | +----------------------------------------+ Pseudo – R-squared

17. Predicting the Outcome • Predicted probabilities P = F(a + b1Cost + b2Time + cIncome) • Predicting outcomes • Predict y=1 if P is large • Use 0.5 for “large” (more likely than not) • Count successes and failures

18. Individual Predictions from a Logit Model Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1] 81 .00000 .00000 .0000 -3.3944 .0325 85 .00000 .00000 .0000 -2.1901 .1006 89 1.0000 .00000 1.0000 -2.6766 .0644 93 1.0000 1.0000 .0000 .8113 .6924 97 1.0000 1.0000 .0000 2.6845 .9361 101 1.0000 1.0000 .0000 2.4457 .9202 105 1.0000 .00000 1.0000 -3.2204 .0384 109 1.0000 1.0000 .0000 .0311 .5078 113 .00000 .00000 .0000 -2.1704 .1024 117 .00000 .00000 .0000 -3.3729 .0332 445 .00000 1.0000 -1.0000 .0295 .5074 Note two types of errors and two types of successes.

19. Predictions in Binary Choice Predict y = 1 if P > P* Success depends on the assumed P*

20. ROC Curve • Plot %Y=1 correctly predicted vs. %y=1 incorrectly predicted • 450 is no fit. Curvature implies fit. • Area under the curve compares models

21. Aggregate Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 151 1 | 152 1 20 38 | 58 ------ ---------- + ----- Total 171 39 | 210

22. Analyzing Predictions Frequencies of actual & predicted outcomes Predicted outcome has maximum probability. Threshold value for predicting Y=1 is P* .5000. (This table can be computed with any P*.) Predicted ------ -------------------- + ----- Actual 0 1 | Total ------ ----------------------+------- 0 N(a0,p0) N(a0,p1) | N(a0) 1 N(a1,p0) N(a1,p1) | N(a1) ------ ----------------------+ ----- Total N(p0) N(p1) | N

23. Analyzing Predictions - Success • Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%] • Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%] • Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%] • Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%] • Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]

24. Analyzing Predictions - Failures • False positive for true negative = %actual 0s predicted as 1s = 100N(a0,p1)/N(a0) % [100(1/152)=0.668%] • False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%] • False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%] • False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%] • False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]

25. Aggregate Prediction is a Useful Way to Assess the Importance of a Variable Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 145 7 | 152 1 48 10 | 58 ------ ---------- + ----- Total 193 17 | 210 Predicted ------ ---------- + ----- Actual 0 1 | Total ------ ---------- + ----- 0 151 1 | 152 1 20 38 | 58 ------ ---------- + ----- Total 171 39 | 210 Model fit without TTME Model fit with TTME