Logistic regression

Logistic regression A quick intro

Why Logistic Regression? • Big idea: dependent variable is a dichotomy (though can use for more than 2 categories i.e. multinomial logistic regression) • Why would we use? • One thing to use a t-test (or multivariate counterpart) to say groups are different, however it may be the research goal to predict group membership • Clinical/Medical context • Schizophrenic or not • Clinical depression or not • Cancer or not • Social/Cognitive context • Vote yes or no • Preference A over B • Graduate or not • Things to cover • Relationship to typical multiple regression • Interpretation of fit • Interpretation of coefficients

Questions • Can the cases be accurately classified given a set of predictors? • Can the solution generalize to predicting new cases? • Comparison of equation with predictors plus intercept to a model with just the intercept • What is the relative importance of each predictor? • How does each variable affect the outcome? • Does a predictor make the solution better or worse or have no effect? • Are there interactions among predictors? • Does adding interactions among predictors (continuous or categorical) significantly improve the model? • Can parameters be accurately estimated? • What is the strength of association between the outcome variable and a set of predictors?

Multiple regression approach • With MR, we used a method to minimize the squared deviations from our predicted values • Can’t really pull off with dichotomous variable • Only two outcome values to produce residuals • Can’t meet normality or homoscedasticity assumptions • While it could produce what are essentially predicted probabilities of belonging to a particular group, those probabilities are not bounded by zero and 1 • Logistic regression will allow us to go about the prediction/explanation process in a similar manner, but without the problems

Assumptions • The only “real” limitation with logistic regression is that the outcome must be discrete. • If the distributional assumptions are met for it then discriminant function analysis may be more powerful, although it has been shown to overestimate the association using discrete predictors. • If the outcome is continuous then multiple regression is more powerful given that the assumptions are met

Assumptions • Ratio of cases to variables: using discrete variables requires that there are enough responses in every given category to allow for reasonable estimation of parameters/predictive power • Linearity in the logit – the IVs should have a linear relationship with the logit form of the DV. • There is no assumption about the predictors being linearly related to each other.

Assumptions • Absence of collinearity among predictors • No outliers • Independence of errors • Assumes categories are mutually exclusive

Coefficients • In interpreting coefficients we’re now thinking about a particular case’s tendency toward some outcome • The problem with probabilities is that they are non-linear • Going from .10 to .20 doubles the probability, but going from .80 to .90 only increases the probability somewhat • With logistic regression we start to think about the odds • Odds are just an alternative way of expressing the likelihood (probability) of an event. • Probability is the expected number of the event divided by the total number of possible outcomes • Odds are the expected number of the event divided by the expected number of non-event occurrences. • Expresses the likelihood of occurrence relative to likelihood of non-occurrence

Odds • Let's begin with probability. Let's say that the probability of success is .8, thus • p = .8 • Then the probability of failure is • q = 1 - p = .2 • The odds of success are defined as • odds(success) = p/q = .8/.2 = 4, • that is, the odds of success are 4 to 1. • We can also define the odds of failure • odds(failure) = q/p = .2/.8 = .25, • that is, the odds of failure are 1 to 4.

Odds Ratio • Next, let's compute the odds ratio by • OR = odds(success)/odds(failure) = 4/.25 = 16 • The interpretation of this odds ratio would be that the odds of success are 16 times greater than for failure. • Now if we had formed the odds ratio the other way around with odds of failure in the numerator, we would have gotten • OR = odds(failure)/odds(success) = .25/4 = .0625 • Here the interpretation is that the odds of failure are one-sixteenth the odds of success.

Logit • Logit • Natural log (e) of an odds • Often called a log odds • The logit scale is linear • Logits are continuous and are centered on zero (kind of like z-scores) • p = 0.50, odds = 1, then logit = 0 • p = 0.70, odds = 2.33, then logit = 0.85 • p = 0.30, odds = .43, then logit = -0.85

Logit • So conceptually putting things in our standard regression form: • Log odds = bo + b1X • Now a one unit change in X leads to a b1 change in the log odds • In terms of odds: • In terms of probability: • Thus the logit, odds and probability are different ways of expressing the same thing

Coefficients • The raw coefficients for our predictor variables in our output are the amount of increase in the log odds given a one unit increase in that predictor • The coefficients are determined through an iterative process that finds the coefficients that best match the data at hand • Maximum likelihood • Starts with a set of coefficients (e.g. ordinary least squares estimates) and then proceeds to alter until almost no change in fit • Note that with SPSS it codes the outcome variable as 0 and 1 and predicts with respect to the 0 category • Might be more intuitive to switch the coefficients’ signs with your output

Coefficients • We also receive a different type of coefficient expressed in odds • Anything above 1 suggests an increase in odds of an event, less than, a decrease in the odds • For example, if 1.14, moving on the independent variable 1 unit increases the odds of the event by a factor of 1.14 • Essentially it is the odds ratio for one value of X vs. the next value of X • More intuitively it refers to the percentage increase (or decrease) of becoming a member of group such and such with a one unit increase in the predictor variable

Example • Example: predicting art museum visitation by education, age, income, and political views • Gss93 dataset • SPSS will start with “Block 0” which is testing to see whether the intercept is a worthwhile predictor by itself • In other words, is just guessing one of the outcomes all the time going to be enough

Model fit • Goodness-of-fit statistics help you to determine whether the model adequately describes the data • Here statistical significance is not desired • More like a badness of fit really, and problematic since one can’t accept the null due to non-significance • Best used descriptively perhaps • Pseudo r-squared statistics • In this dichotomous situation we will have trouble with devising an r2

Model fit • Cox & Snell’s value would not reach 1.0 even for a perfect fit • Nagelkerke is a version of C&S that would • Probably preferred but may be a little optimistic (just like our regular R-square) • The Hosmer and Lemeshow GOF suggests we’re ok too

Coefficients • Would appear age is the only one that doesn’t contribute statistically significantly • Note the odds ratio of 1.00 • Moving one unit in age doesn’t say anything about whether you will be more or less likely to go to the museum • Polview (1 extreme lib, 7 extreme cons) isn’t perhaps doing much either • More conservative less likely to go to museum but only a very small change • Education • More education more likely to visit • More interest? • Income • Higher income more likely to visit • More leisure?

Classification • Classification table • Here we get a good sense of how well we’re able to predict the outcome. • 69% overall compared to 58.7% if we just guessed no (Block 0)

Other measures regarding classification The classification stats from DFA would apply here as well

Logistic regression