Multinomial Logit & Ordered Probit

Multinomial Logit & Ordered Probit

Multinomial Logit • Is used when the data cannot be ordered. An example is choice of holiday: (i) beach, (ii) mountain, (iii) culture. For each individual they are go on just one holiday. • We will examine this within the context of insurance data. The exact meaning does not matter, just treat it like holiday data. But for a clue type: describe summ *ins* label list insure

use http://www.stata-press.com/data/r11/sysdsn1.dta,clear There are 3 options: those who prepay, those who are not insured and those who are covered by an indemnity generate site1=site==1 generate site2=site==2 generate site3=site==3 NOW TYPE: mlogit insure age male nonwhite site2 site3

Note two equations one to exalpain those who opt for ‘prepaid’ and a second for those who opt for ‘uninsure’

But there are three choices, so why two equations. Well if you know the determinants of two of the choices the third comes about from default. • It can also be viewed as the default choice against which the other two are being compared. • Here the default case is the first, indemnity. Could we change it? YES.

mlogit insure age male nonwhite site2 site3, base(2) This will change the default case to the second option.

Data also comes from: • use http://www.stata-press.com/data/r11/sysdsn1.dta • mlogit insure age male nonwhite

Clear, set memory and load data clear set mem 100000 use "http://staff.bath.ac.uk/hssjrh/oprob.dta"

Describe pers

The variable relates to a person’s situation and how it has changed over the last five years. • Let us look at it. • Type: tab2 pers pers

The most common response was improved, but for over half of the sample this was not the case

Ordered probit • We use this when we have discrete data and when it is ordered. In this case • 1 best (improved) • 2 next best (stayed about the same) • 3 worst (got worse). The ordering is clear.

Change in personal situation Assume an underlying and continuous variable relating to changes in the individual’s personal situation

Change in personal situation If this underlying variable is to the left of μ1 we classify the variable as ‘1’ the individual’s position has improved

Change in personal situation If this underlying variable is to the right of μ2 we classify the variable as ‘3’ the individual’s position has got worse

Change in personal situation In between these two values we classify the variable as ‘2’ the individual’s position has stayed the same

You might say: surely ‘stay the same’ is one specific value (perhaps 0) anything to the left of this has improved and anything to the right has got worse. • But it is common to assume a range of values which denote too small a change to denote either ‘improve’ or ‘got worse’ and these values are μ2 andμ1

Do the estimation. • Simply use oprobit rather than regress. oprobitpersilgnipc male age agesqrlawestonia village town selfempmarrd educ2 unemp manual if age<98 & age>17 & persi<4 This regresses persi (note we do not have to write its full name as this is the only variable in the data set to begin with persi) on a set of right hand side variables

if age<98 & age>17 & persi<4 This limits the regressions to individuals older than 17 and under 98 and also cuts out those who answered dont know (coded 4) for persi

The results

The summary output shows the number of observations, the log likelihood and the likelihood ratio. A pseudo R2 is exactly that and we may cover in the lectures later. It is rarely very high in ordered probit.

Remember the lower is the dependent variable (persi...) the better the person has done (1 for improved, 3 got worse). So a negative coefficient indicates that as that variable increases so the person tends to have been doing better. OK The self employed have been doing better as have people in Estonia???????? Those in countries with a good rule of law have done better and those in richer countries too (lgnipic: log Gross nattional income per capita)

Married people and educated people have been doing better but the unemployed and manual workers worse.

Impact of age • The impact of age is thus 0.0513* AGE - 0.0322*AGE*AGE/100 • 0.0322*AGE*AGE/100 because this is how age squared was calculated • So the impact is: • AGE IMPACT • 1.0812 • 1.5368 • 1.8474 • 70 2.0132 As people get older the probability of things getting worse increases. WHY?

And finally These are the estimates of μ1 and μ2

If for an individual the predicted value from the regression is less than -0.6564 then they would be predicted to be categorised as ‘1’ –position improved. • If for an individual the predicted value from the regression is greater than 0.3096 then they would be predicted to be categorised as ‘3’ –position has got worse..

And if the predicted value lies between these two values, then predicted value is ‘no change’.

Let us calculate some examples. First do the regression and store the coefficient vector as cy oprobitpersilgnipc male age agesqrlawestonia village town selfempmarrd educ2 unemp manual if age<98 & age>17 & persi<4 matrix cy= e(b)

oprobitpersilgnipc male age agesqrlawestonia village town selfempmarrd educ2 unemp manual if age<98 & age>17 & persi<4 cy[1,1] is the coefficient on lgnipc. The average value for this is 3.0 • Then calculate scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

cy[1,2] is the coefficient on male. Let us code this as 1, i.e. We are predicting for a man. • scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

scalar py50 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 50 + cy[1,4]* 50*50/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0 • The other characteristics are 50 years old, country with the highest level of rule of law (5), etc,

This lies between -0.6564 and 0.3096, the two critical values and hence this person would be predicted to be ‘no change’ Now let us try the same person, but aged 30. scalar py30 =cy[1,1]*3.0 + cy[1,2]*1 + cy[1,3]* 30 + cy[1,4]* 30*30/100 + cy[1,5]*5+ cy[1,6]*0 + cy[1,7]*1 + cy[1,8]*0 + cy[1,9]*0 + cy[1,10]*1 + cy[1,11]*4 + cy[1,12]*0 + cy[1,13]*0

This is less than the lower critical value of -0.6564 hence this person would be predicted to have improved.

No one has ever analysed this before and there may be a paper. • That people’s situation gets worse as they age is not surprising, once they reach say 50. But these results suggest It is so for those aged 30 viz a viz 20, just as much as 60 viz a viz 50. • Perhaps we should try a spline on this just to check the quadratic form on age is not misleading • And why do educated people fare better?

Multinomial Logit ‘by hand’ program myologit args lnf xb a1 a2 quietly replace `lnf' = ln(1/(1+exp(-à1' + `xb'))) if $ML_y1 == 1 quietly replace `lnf' = ln(1/(1+exp(-à2'+ `xb')) - 1/(1+exp(-à1' + `xb'))) if $ML_y1 == 2 quietly replace `lnf' = ln(1 - 1/(1+exp(-à2'+ `xb'))) if $ML_y1 == 3 end

* specify the method (lf) and the name of your evaluator (myologit) * followed by the equation(s) in parantheses and then the cutpoints. ml model lf myologit (xb: insure = age male nonwhite ) /a1 /a2 ml check ml search ml maximize,iterate(50) ologitinsure age male nonwhite oprobitinsure age male nonwhite

Does not converge and no second cut off point. But the coefficients per se the same as if we use the ologit command:

ologit insure age male nonwhite See also: http://www.ats.ucla.edu/stat/stata/code/ml_maximize.htm

Multinomial Logit & Ordered Probit