VI. Logistic Regression

VI. Logistic Regression

An event occurs or doesn’t. • A category applies to an observation or doesn’t. • A student passes or fails. • A patient survives or dies. • A candidate wins or loses. • A person is poor or not poor. • A person is a citizen or not.

These are examples of categorical data. • They are also examples of binary discrete phenomena. • Binary discrete phenomena usually take the form of a dichotomous indicator, or dummy, variable. • It’s best to code binary discrete phenomena 0/1 so that the mean of the dummy variable equals the proportion of cases with a value of 1, & can be interpreted as a probability: e.g., mean of female=.545 (=sample’s probability of being female).

Here’s what we’re going to figure out how to interpret:

. logit hsci read math female, or nolog Logit estimates Number of obs = 200 LR chi2(3) = 60.07 Prob > chi2 = 0.0000 Log likelihood = -79.013272 Pseudo R2 = 0.2754 ------------------------------------------------------------------------------ hsci | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | 1.073376 .0274368 2.77 0.006 1.020926 1.128521 math | 1.10315 .0316097 3.43 0.001 1.042904 1.166877 female | .3380283 .1389325 -2.64 0.008 .1510434 .7564918 ------------------------------------------------------------------------------

OLS regression encounters serious problems in dealing with a binary dependent variable: • OLS’s explanatory variable coefficients can extend to positive or negative infinity, but binary probabilities & proportions can’t exceed 1 or fall below 0. • OLS is premised on linearity, but with a binary dependent variable the effects of an explanatory variable are non-linear at the binary variable’s lower & upper levels.

(3) OLS is also premised on additivity, but with a binary outcome variable an explanatory variable’s effect depends on the relative effects of the other variables: if, say, one explanatory variable pushes the probability of the binary outcome variable near 0 or near 1, then the effects of the other explanatory variables can’t have much influence. (4) And because a binary outcome variable has just two values, it violates the OLS assumptions of normality &, more important, non-constant variance of residuals.

What to do? A logit transformation is an advantageous way of representing the S-shaped curve of a binary outcome variable’s y/x distribution. • A probit transformation, which has somewhat thinner tails, is also commonly used. • A complementary log-logtransformation or a scobit transformation is often used if the binary outcome variable is highly skewed. • There are other such transformations as well (see, e.g., Long & Freese; & the Stata manuals).

A logit transformation changes probabilities into logged odds, which eliminate the binary proportion (or probability) ceiling of 1. • Odds express the likelihood of occurrence relative to the likelihood of non-occurrence (i.e. odds are the ratio of the proportions of the two possible outcomes [see Moore & McCabe, chap. 15, pages 40-42]): • odds = event’s probability/1 – event’s probability • probability = event’s odds/1 + event’s odds

To repeat, a logit transformation changes probabilities into logged odds, which eliminate the binary proportion (or probability) ceiling of 1. • Logged odds are also known as logits (i.e. the natural log of the odds). • Why not stick with odds rather than logged odds? Because logged odds (i.e. logits) also eliminate the binary proportion (or probability) floor of 0.

So, on balance, the logit transformation eliminates the outcome variable’s proportion (or probability) ceiling of 1 & floor of 0. • Thus an explanatory variable coefficients can extend to positive or negative infinity.

Note: larger sample size is even more important for logistic regression than for OLS regression.

So that we get a feel for what’s going on, & can do the exercises in Moore & McCabe, chap. 15, let’s compute some odds & logged odds (i.e. logits).

Display the odds of being in honors math: • . tab hmath • (>=60) Freq. Percent Cum. • 0 151 75.50 75.50 • 1 49 24.50 100.00 • Total 200 100.00 • . display .245/(1 - .245) = .3245 • Interpretation? The event occurs .3245 times per each time it does not occur. • That is, there are 32.45 occurrences per 100 non-occurrences.

Display the logged odds of being in honors math: • . display ln(.3245) = -1.1255 • Display the odds of not being in honors math: • . di .755/(1 - .755) = 3.083 • Display the logged odds of not being in honors math: • . di ln(3.083) = 1.126

Although we’ll never have to do the following—thankfully, software will do it for us automatically—let’s make a variable that combines the odds of being in honors math versus not being in honors math. • . gen ohmath = .3245 if hmath==1 • . replace ohmath = 3.083 if hmath==0 • . tab ohmath

Let’s transform ohmath into another variable that represents logged odds: • . gen lohmath = ln(ohmath) • . su ohmath lohmath • And how could we display lohmath not as logged odds but rather as odds (i.e. as ohmath)? • . display exp(lohmath)

From the standpoint of regression analysis, why should we indeed have transformed the variable into logged odds? • That is, what are the advantages of doing so?

Overall, a logit transformation of a binary outcome variable linearizes the non-linear relationship of X with the probability of Y. • It does so as the logit transformation: • eliminates the upper & lower probability ceilings of the binary variable; & • is symmetric around the mid-range probability of 0.5, so that probabilities below this value have negative logits (i.e. logged odds) while those above this value have positive logits (i.e. logged odds).

Let’s summarize: • the effect of X on the probability of binary Y is non-linear; but • the effect of X on logit-transformed binary Y is linear. • We call the latter either logit or logistic regression: they’re the same thing. • Let’s fit a model using hsb2.dta:

. logit hsci read math female, nolog Logit estimates Number of obs = 200 LR chi2(3) = 60.07 Prob > chi2 = 0.0000 Log likelihood = -79.013272 Pseudo R2 = 0.2754 ------------------------------------------------------------------------------ hsci | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .0708092 .0255612 2.77 0.006 .0207102 .1209082 math | .09817 .028654 3.43 0.001 .0420091 .1543308 female | -1.084626 .4110086 -2.64 0.008 -1.890188 -.2790636 _cons | -9.990621 1.60694 -6.22 0.000 -13.14017 -6.841076 ------------------------------------------------------------------------------

How, though, do we interpret logit coefficients in meaningful terms? • That is, how do we interpret ‘logged odds’? • The gain in parsimony via the logit transformation is mitigated by the loss in interpretability: the metric of logged odds (i.e. logits) is not instrinsically meaningful to us.

An alternative, more comprehensible approach is to express regression coefficients not as not logged odds but rather as odds: • odds = event’s probability/1 – event’s probability

The odds are obtained by taking the exponent, or anti-log, of the logged odds (i.e. the logit coefficient): • . odds of honors math: di exp(-1.1255) = .325 • . odds of not honors math: di exp(1.126) =3.083 • Review: interpretation?

What are the odds of being in honors math? • odds: a ratio of probabilities • = event’s probability/1 - event’s probability

What are the odds of being in honors math versus the odds of not being in honors math? This is called an odds ratio. • odds ratio: a ratio of odds • odds ratio = .325/3.083 = .105 • Interpretation? The odds of being in honors math are .105 those of not being in honors math.

Via logit or logistic regression, Stata gives us slope coefficients as odds, instead of logged odds, in any of the following ways: • (1) logit hsci read math female, or nolog • (2) logistic hsci read math female, nolog • (3) quietly logit hsci read math female • listcoef, factor help

But expressing slope coefficient as odds, instead of logged odds, causes a complication: the equation determining the odds is not additive but rather multiplicative.

For every 1-unit increase in reading score, the odds of being in honors science increase by the multiple of 1.07 on average, holding the other variables constant. • Every 1-unit increase in math score, the odds of being in honors science by the factor of 1.10 on average, holding the other variables constant. • The odds of being in honors science is lower for females than males by a multiple of .338 on average, holding the other variables constant.

The Metrics • logit = 0 is the equivalent of odds=1 & the equivalent of probability = .5

Regarding odds ratios: an odds ratio of .5, which indicates a negative effect, is of the same magnitude as a positive-effect odds ratio of 2.0. • Here’s a helpful way to disentangle this complication after estimating a logit (i.e. logistic) model: • . listcoef, reverse help • This reverses the outcome variable.

. listcoef, reverse help [So that the coefficients refer to the odds of not being in honors science.] logit (N=200): Factor Change in Odds Odds of: 0 vs 1 ---------------------------------------------------------------------- hsci | b z P>|z| e^b e^bStdX SDofX -------------+-------------------------------------------------------- read | 0.07081 2.770 0.006 0.9316 0.4838 10.2529 math | 0.09817 3.426 0.001 0.9065 0.3986 9.3684 female | -1.08463 -2.639 0.008 2.9583 1.7185 0.4992 b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD increase in X SDofX = standard deviation of X

listcoef, reverse related the explanatory variables to the odds of not being in hsci.

An easier interpretation than odds ratios is percentage change in odds: • . quietly logit hsci read math female • . listcoef, percent help • ---------------------------------------------------------------------- • hsci | b z P>|z| % %StdX SDofX • -------------+-------------------------------------------------------- • read | 0.07081 2.770 0.006 7.3 106.7 10.2529 • math | 0.09817 3.426 0.001 10.3 150.9 9.3684 • female | -1.08463 -2.639 0.008 -66.2 -41.8 0.4992 • ---------------------------------------------------------------------- • P>|z| = p-value for z-test • % = percent change in odds for unit increase in X • %StdX = percent change in odds for SD increase in X • SDofX = standard deviation of X

For every 1-unit increase in reading score, the odds of being in honors science increase by 7.3% on average, holding the other variables constant. • For every 1-unit increase in math score, the odds of being in honors science increase by 10.3% on average, holding the other variables constant. • The odds of being in honors science are lower by 66.2% on average for females than males, holding the other variables constant.

The percentage interpretation, then, eliminates the multiplicative aspect of the model.

Alternatives to know about are ‘relative risk’ & ‘relative risk ratio’. • See, e.g., Utts, chap. 12. • And see the downloadable command ‘relrisk’. . logit hsci read math female, nolog . relrisk

Pseudo-R2: this is not equivalent to OLS R2. • Many specialists (e.g., Pampel) recommend not to report pseudo-R2 . • Its metric is different from OLS R2—typically it’s much lower than OLS R2—but readers (including many academic & policy specialists) are not aware of this difference.

In the logistic equation we could have specified the options robust &/or cluster (as in OLS regression). • Specifying ‘cluster’ automatically invokes robust standard errors.

Before we proceed, keep in mind the following: • odds=1 is the equivalent of: • logit=0 • probability=0.5

Besides logits (i.e. logged odds) & odds, we can also interpret the relationships from the perspective of probabilities. • Recall, though, that the the relationship between X & the probability of binary Y is non-linear. • Thus the effect of X on binary Y has to beidentified at particular X-values or at a particular sets of values for X’s; & we compare the effects across particular X-levels.

. logit hsci read math female, or nolog • . prvalue, r(mean) delta brief • logit: Predictions for hsci • Pr(y=1|x): 0.1525 95% ci: (0.0998,0.2259) • Pr(y=0|x): 0.8475 95% ci: (0.7741,0.9002) • prvalue, like the rest of the pr-commands, can only be used after estimating a regression model. • It summarizes the samples 1/0 probabilities for the specified binary y-variable holding all of the data set’s (not just the model’s) other variables at their means (though medians can be specified alternatively).

. prvalue, x(female=0) r(mean) delta brief logit: Predictions for hsci Pr(y=1|x): 0.2453 95% ci: (0.1568,0.3622) Pr(y=0|x): 0.7547 95% ci: (0.6378,0.8432) . prvalue, x(female=1) r(mean) delta brief logit: Predictions for hsci Pr(y=1|x): 0.0990 95% ci: (0.0526,0.1784) Pr(y=0|x): 0.9010 95% ci: (0.8216,0.9474)

Comparing males across particular scores: • . prvalue, x(read=40 math=40 female=0), delta b save • . prvalue, x(read=60 math=60 female=0), delta b dif • How else could we compare the estimated probabilities of being in honors science? By comparing female=0 versus female=1 at particular reading & math scores. Note: ‘b’ – ‘brief’ (i.e. display only the model’s most relevant values)

Comparing males & females at particular scores: • . prvalue, x(read=40 math=40 female=0), delta b save • . prvalue, x(read=40 math=40 female=1), delta b dif • Or try prtab to see, e.g., how female versus male estimated probabilities vary across the range of math scores, holding reading scores constant at 40 (but does not provide a confidence interval): • . prtab math female, x(read=40) brief

Or try prchange to see, e.g., how female estimated probabilities vary as female math scores increase from 40 to 60 (which, however, does not provide a confidence interval): • . prchange math, x(female=1 math=40) fromto delta(20) uncentered brief

A problem with prtab & prchange is that they don’t give confidence intervals, which prvalue delta does provide. • Here’s a different way of making predictions—for logged odds, odds, or probabilities—that gives confidence intervals: • . adjust math=40, by(female) ci • . adjust math=40, by(female) exp ci • . adjust math=40, by(female) pr ci • Note: the first variant can be used to obtain predicted coefficients with OLS regression as well.

Remember: we can examine the relationship of X with binary Y via: • logits (i.e. logged odds) • odds; or • probabilities • What are the differences in functional forms & interpretations?

VI. Logistic Regression

VI. Logistic Regression

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression