Understanding Probit and Logit Models for Dichotomous Data Analysis

Section 3 Probit and Logit Models

Dichotomous Data • Suppose data is discrete but there are only 2 outcomes • Examples • Graduate high school or not • Patient dies or not • Working or not • Smoker or not • In data, yi=1 if yes, yi =0 if no

How to model the data generating process? • There are only two outcomes • Research question: What factors impact whether the event occurs? • To answer, will model the probability the outcome occurs • Pr(Yi=1) when yi=1 or • Pr(Yi=0) = 1- Pr(Yi=1) when yi=0

Think of the problem from a MLE perspective • Likelihood for i’th observation • Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi) • When yi=1, only relevant part is Pr(Yi=1) • When yi=0, only relevant part is [1 - Pr(Yi=1)]

L = Σi ln[Li] = = Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] } • Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1)

Modeling the probability • There is some process (biological, social, decision theoretic, etc) that determines the outcome y • Some of the variables impacting are observed, some are not • Requires that we model how these factors impact the probabilities • Model from a ‘latent variable’ perspective

Consider a women’s decision to work • yi* = the person’s net benefit to work • Two components of yi* • Characteristics that we can measure • Education, age, income of spouse, prices of child care • Some we cannot measure • How much you like spending time with your kids • how much you like/hate your job

We aggregate these two components into one equation • yi* = β0 + x1iβ1+ x2iβ2+… xkiβk+ εi = xi β + εi • xi β (measurable characteristics but with uncertain weights) • εi random unmeasured characteristics • Decision rule: person will work if yi* > 0 (if net benefits are positive) yi=1 if yi*>0 yi=0 if yi*≤0

yi=1 if yi*>0 • yi* = xi β + εi > 0 only if • εi > - xi β • yi=0 if yi*≤0 • yi* = xi β + εi ≤ 0 only if • εi ≤ - xi β

Suppose xi β is ‘big.’ • High wages • Low husband’s income • Low cost of child care • We would expect this person to work, UNLESS, there is some unmeasured ‘variable’ that counteracts this

Suppose a mom really likes spending time with her kids, or she hates her job. • The unmeasured benefit of working has a big negative coefficient εi • If we observe them working, εi must not have been too big, since • yi=1 if εi > - xi β

Consider the opposite. Suppose we observe someone NOT working. • Then εi must not have been big, since • yi=0 if εi ≤ - xi β

Logit • Recall yi =1 if εi > - xi β • Since εi is a logistic distribution • Pr(εi > - xi β) = 1 – F(- xi β) • The logistic is also a symmetric distribution, so • 1 – F(- xi β) • = F(xi β) • = exp(xi β)/(1+exp(xi β))

When εi is a logistic distribution • Pr(yi =1) = exp(xi β)/(1+exp(xi β)) • Pr(yi=0) = 1/(1+exp(xi β))

Example: Workplace smoking bans • Smoking supplements to 1991 and 1993 National Health Interview Survey • Asked all respondents whether they currently smoke • Asked workers about workplace tobacco policies • Sample: workers • Key variables: current smoking and whether they faced by workplace ban

Data: workplace1.dta • Sample program: workplace1.doc • Results: workplace1.log

Description of variables in data • . desc; • storage display value • variable name type format label variable label • ------------------------------------------------------------------------ • > - • smoker byte %9.0g is current smoking • worka byte %9.0g has workplace smoking bans • age byte %9.0g age in years • male byte %9.0g male • black byte %9.0g black • hispanic byte %9.0g hispanic • incomel float %9.0g log income • hsgrad byte %9.0g is hs graduate • somecol byte %9.0g has some college • college float %9.0g • -----------------------------------------------------------------------

Summary statistics • sum; • Variable | Obs Mean Std. Dev. Min Max • -------------+-------------------------------------------------------- • smoker | 16258 .25163 .433963 0 1 • worka | 16258 .6851396 .4644745 0 1 • age | 16258 38.54742 11.96189 18 87 • male | 16258 .3947595 .488814 0 1 • black | 16258 .1119449 .3153083 0 1 • -------------+-------------------------------------------------------- • hispanic | 16258 .0607086 .2388023 0 1 • incomel | 16258 10.42097 .7624525 6.214608 11.22524 • hsgrad | 16258 .3355271 .4721889 0 1 • somecol | 16258 .2685447 .4432161 0 1 • college | 16258 .3293763 .4700012 0 1

Running a probit • probit smoker age incomel male black hispanic hsgrad somecol college worka; • The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables • Includes a constant as a default

Running a logit • logit smoker age incomel male black hispanic hsgrad somecol college worka; • Same as probit, just change the first word

Running linear probability • reg smoker age incomel male black hispanic hsgrad somecol college worka, robust; • Simple regression. • Standard errors are incorrect (heteroskedasticity) • robust option produces standard errors with arbitrary form of heteroskedasticity

Probit Results • Probit estimates Number of obs = 16258 • LR chi2(9) = 819.44 • Prob > chi2 = 0.0000 • Log likelihood = -8761.7208 Pseudo R2 = 0.0447 • ------------------------------------------------------------------------------ • smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574 • incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193 • male | .0533213 .0229297 2.33 0.020 .0083799 .0982627 • black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137 • hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235 • hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453 • somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262 • college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366 • worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702 • _cons | .870543 .154056 5.65 0.000 .5685989 1.172487 • ------------------------------------------------------------------------------

How to measure fit? • Regression (OLS) • minimize sum of squared errors • Or, maximize R2 • The model is designed to maximize predictive capacity • Not the case with Probit/Logit • MLE models pick distribution parameters so as best describe the data generating process • May or may not ‘predict’ the outcome well

Pseudo R2 • LLk log likelihood with all variables • LL1 log likelihood with only a constant • 0 > LLk > LL1 so | LLk | < |LL1| • Pseudo R2 = 1 - |LL1/LLk| • Bounded between 0-1 • Not anything like an R2 from a regression

Predicting Y • Let b be the estimated value of β • For any candidate vector of xi , we can predict probabilities, Pi • Pi = Ф(xib) • Once you have Pi, pick a threshold value, T, so that you predict • Yp = 1 if Pi > T • Yp = 0 if Pi ≤ T • Then compare, fraction correctly predicted

Question: what value to pick for T? • Can pick .5 • Intuitive. More likely to engage in the activity than to not engage in it • However, when the  is small, this criteria does a poor job of predicting Yi=1 • However, when the  is close to 1, this criteria does a poor job of picking Yi=0

*predict probability of smoking; • predict pred_prob_smoke; • * get detailed descriptive data about predicted prob; • sum pred_prob, detail; • * predict binary outcome with 50% cutoff; • gen pred_smoke1=pred_prob_smoke>=.5; • label variable pred_smoke1 "predicted smoking, 50% cutoff"; • * compare actual values; • tab smoker pred_smoke1, row col cell;

. sum pred_prob, detail; • Pr(smoker) • ------------------------------------------------------------- • Percentiles Smallest • 1% .0959301 .0615221 • 5% .1155022 .0622963 • 10% .1237434 .0633929 Obs 16258 • 25% .1620851 .0733495 Sum of Wgt. 16258 • 50% .2569962 Mean .2516653 • Largest Std. Dev. .0960007 • 75% .3187975 .5619798 • 90% .3795704 .5655878 Variance .0092161 • 95% .4039573 .5684112 Skewness .1520254 • 99% .4672697 .6203823 Kurtosis 2.149247

Notice two things • Sample mean of the predicted probabilities is close to the sample mean outcome • 99% of the probabilities are less than .5 • Should predict few smokers if use a 50% cutoff

| predicted smoking, • is current | 50% cutoff • smoking | 0 1 | Total • -----------+----------------------+---------- • 0 | 12,153 14 | 12,167 • | 99.88 0.12 | 100.00 • | 74.93 35.90 | 74.84 • | 74.75 0.09 | 74.84 • -----------+----------------------+---------- • 1 | 4,066 25 | 4,091 • | 99.39 0.61 | 100.00 • | 25.07 64.10 | 25.16 • | 25.01 0.15 | 25.16 • -----------+----------------------+---------- • Total | 16,219 39 | 16,258 • | 99.76 0.24 | 100.00 • | 100.00 100.00 | 100.00 • | 99.76 0.24 | 100.00

Check on-diagonal elements. • The last number in each 2x2 element is the fraction in the cell • The model correctly predicts 74.75 + 0.15 = 74.90% of the obs • It only predicts a small fraction of smokers

Do not be amazed by the 75% percent correct prediction • If you said everyone has a  chance of smoking (a case of no covariates), you would be correct Max[(,(1-)] percent of the time

In this case, 25.16% smoke. • If everyone had the same chance of smoking, we would assign everyone Pr(y=1) = .2516 • We would be correct for the 1 - .2516 = 0.7484 people who do not smoke

Key points about prediction • MLE models are not designed to maximize prediction • Should not be surprised they do not predict well • In this case, not particularly good measures of predictive capacity

Translating coefficients in probit:Continuous Covariates • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Suppose that x1i is a continuous variable • d Pr(yi=1) /d x1i = ? • What is the change in the probability of an event give a change in x1i?

Marginal Effect • d Pr(yi=1) /d x1i • = β1φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Notice two things. Marginal effect is a function of the other parameters and the values of x.

Translating Coefficients:Discrete Covariates • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk] • Suppose that x2i is a dummy variable (1 if yes, 0 if no) • Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0. • Redefine the variable of interest. Compare outcomes with and without x2i

y1 = Pr(yi=1 | x2i=1) = Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ] • y0 = Pr(yi=1 | x2i=0) = Φ[β0 + x1iβ1+ x3iβ3 … ] Marginal effect = y1 – y0. Difference in probabilities with and without x2i?

In STATA • Marginal effects for continuous variables, STATA picks sample means for X’s • Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s

STATA command for Marginal Effects • mfx compute; • Must be after the outcome when estimates are still active in program.

Marginal effects after probit • y = Pr(smoker) (predict) • = .24093439 • ------------------------------------------------------------------------------ • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X • ---------+-------------------------------------------------------------------- • age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474 • incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421 • male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476 • black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945 • hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709 • hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527 • somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545 • college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376 • worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514 • ------------------------------------------------------------------------------ • (*) dy/dx is for discrete change of dummy variable from 0 to 1

Interpret results • 10% increase in income will reduce smoking by 2.9 percentage points • 10 year increase in age will decrease smoking rates .4 percentage points • Those with a college degree are 21.5 percentage points less likely to smoke • Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking

Do not confuse percentage point and percent differences • A 6.7 percentage point drop is 29% of the sample mean of 24 percent. • Blacks have smoking rates that are 3.2 percentage points lower than others, which is 13 percent of the sample mean

Comparing Marginal Effects

When will results differ? • Normal and logit CDF look • Similar in the mid point of the distribution • Different in the tails • You obtain more observations in the tails of the distribution when • Samples sizes are large •  approaches 1 or 0 • These situations will produce more differences in estimates

Some nice properties of the Logit • Outcome, y=1 or 0 • Treatment, x=1 or 0 • Other covariates, x • Context, • x = whether a baby is born with a low weight birth • x = whether the mom smoked or not during pregnancy

Risk ratio RR = Prob(y=1|x=1)/Prob(y=1|x=0) Differences in the probability of an event when x is and is not observed How much does smoking elevate the chance your child will be a low weight birth

Let Yyx be the probability y=1 or 0 given x=1 or 0 • Think of the risk ratio the following way • Y11 is the probability Y=1 when X=1 • Y10 is the probability Y=1 when X=0 • Y11 = RR*Y10

Odds Ratio OR=A/B = [Y11/Y01]/[Y10/Y00] A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)] = odds of Y occurring if you are a smoker B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)] = odds of y happening if you are not a smoker What are the relative odds of Y happening if you do or do not experience X

Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function • Can show that • OR = exp(β1) = e β1 • This number is typically reported by most statistical packages

Understanding Probit and Logit Models for Dichotomous Data Analysis

Understanding Probit and Logit Models for Dichotomous Data Analysis

Presentation Transcript

Section 3.

Section 3-3

Section 3

Section 3

Section 3

Section 3

Section ‘3’

Section 3

Section 3

SECTION 3

SECTION 3

Section 3-3

Section 3

Section 3

Section 3

Section 3

Section 3-3

Section 3

SECTION 3

Section 3

Section 3

Section 3