1 / 42

Logistic Regression

Logistic Regression. Part I - Introduction. Logistic Regression. Regression where the response variable is dichotomous (not continuous) Examples effect of concentration of drug on whether symptoms go away effect of age on whether or not a patient survived treatment

lot
Télécharger la présentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression Part I - Introduction

  2. Logistic Regression • Regression where the response variable is dichotomous (not continuous) • Examples • effect of concentration of drug on whether symptoms go away • effect of age on whether or not a patient survived treatment • effect of negative cognitions about SELF, WORLD, or Self-BLAME on whether a participant has PTSD

  3. Simple Linear Regression • Relationship between continuous response variable and continuous explanatory variable • Example • Effect of concentration of drug on reaction time • Effect of age of patient on number of years of post-operation survival

  4. Simple Linear Regression • RT(ms) = β0 + β1x concentration (mg) • β0is value of RT when concentration is 0 • β1is change in RT caused by a change in concentration of 1mg. • E.g. RT = 400 + 50 x concentration

  5. Logistic Regression • What do we do when we have a response variable which is not continuous, but is dichotomous

  6. Probability of Disease Odds of Disease Log(Odds) of Disease Concentration Concentration Concentration

  7. Odds • Odds are simply the ratio of the proportions for the two possible outcomes. • If p is the proportion for one outcome, then 1- p is the proportion for the second outcome.

  8. Odds (Example) • At concentration level 16 we observe 75 participants out of 100 showing no disease (healthy) • If p is the probability of healthy is then p = 0.75. • Then 1 – p is the probability of not healthy, and equals 0.25 • Odds of showing healthy over not healthy given concentration level 16 • p / (1 – p) = 0.75/0.25 = 3 • Means that it is 3 times more likely that person is healthy at concentration level 16

  9. Logarithms • Logarithms are a way of expressing numbers as powers of a base • Example • 102 = 100 • 10 is called the “base” • The power, 2 in this case, is called the “exponent” • Therefore 102 = 100 means that log10100 = 2

  10. Log Odds • Odds of being healthy after 16mg of drug is 3 • Log odds is log(3) = 1.1 • Lets say that odds of being healthy after 2mg of drug is 0.25 • Means that it is four times more likely to not be healthy after 2mg of drug • Log odds is log(0.25) = -1.39

  11. Logistic Regression • With Log-odds we can now look at the linear relationship between dichotomous response and continuous explanatory Where, for example, p is the probability of being healthy at different levels of drug concentration, X

  12. Example: Simple Logistic Regression • Look at the effect of drug concentration on probability of NOT having disease (i.e. being healthy) • Use SPSS to do the regression (we’ll all do this soon) • Get

  13. Looks Like

  14. Interpreting parameters (b0 and b1) in logistic regression is a little tricky • An increase of 1mg of concentration increases the log(odds) of being healthy by 0.106 • An increase of 1mg of concentration increases the odds of being healthy by • Increasing concentration by 1mg increases odds of being healthy by a factor of 1.11

  15. Slope Parameter • Parameter β1in general: • if positive then increasing X increases the odds of p • if negative then increasing X decreases the odds of p • the larger (in magnitude) the larger the effect of X on p • Like simple linear regression, can test whether or not β1 is significantly different from 0.

  16. Let’s break to do simple Logistic Regression • Open XYZ.sav in SPSS • Fit logistic regression with • PTSD (Y/N) as response variable • Self-BLAME as explanatory variable • Is the effect of Self-BLAME significant? • Get parameter estimates • Write equation of model • What is the odds of having PTSD given Self-BLAME score of 3? • Use the interpretation of the regression coefficient to work out odds given Self-BLAME of 4.

  17. Logistic Regression Part II – Multiple Logistic Regression

  18. Multiple Linear Regression • Simple Linear Regression extended out to more than one explanatory variable • Example • Effect of both concentration and age on reaction time • Effect of age, number of previous operations, time in anaesthesia, cholesterol level, etc. on number of years of post-operation survival

  19. Multiple Linear Regression RT(ms) = β0 + β1x concentration (mg) + β2 x age + β3x gender (0=male,1=female) β0is value of RT when concentration is 0. β1is change in RT caused by a change in concentration of 1mg. β2is change in RT caused by a change in age of 1 year. β3is change in RT caused by going from male to female in gender.

  20. Multiple Logistic Regression • Look at the effect of drug concentration, age and gender on probability of NOT having disease Where p is the probability of not having thedisease, X1 is the concentration of drug (mg), X2 is age (years), and X3 is gender (0 for males, 1 for females)

  21. Again, use SPSS to fit logistic model • Increasing concentrationincreasesodds of not having the disease (again, being healthy) • Increasing agedecreasesodds of being healthy • “Increasing” gender (from male to female) increases odds of being healthy • In particular, increasing age decreases the odds of being healthy by a factor of 0.95 • M to F increases odds by factor of 1.001

  22. Was it worth adding the factors? • When we add parameters we make our model more complicated. • We really want this addition to be “worth it” • In other words, adding age and gender should improve our explanation of disease • But what constitutes an improvement

  23. Was it worth adding the factors? • Quality (badness) of model fit is given by -2logL • If we fit want to see if it was worth adding parameters we can compare the quality of the fit of the simple and the morecomplex model • Quality of model fit follows a chi-square (χ2) distribution with degrees-of-freedom (df) equal to the number of parameters in the model • The difference between quality of fit also follows a χ2 distribution with df equal to the difference in the number of parameters between the two models

  24. Was it worth adding these factors? • Simple logistic regression model has overall χ2 of 45.7 • This multiple logistic regression model with 2 extra parameters has χ2 of 40.02 • Test whether χ2 = 45.7 - 40.02 = 5.68 is a significant improvement • Critical χ2 for 2 df is 5.99 • Our χ2 is smaller and so NO, not worth it

  25. BUT… • It doesn’t look like gender is having much of an effect • Check SPSS output and see that Wald χ2 for Gender is 0.527, which has p = .47 • Perhaps it wasn’t worth adding both parameters, but it will be worth just adding Age • Agehas Wald-χ2 = 4.33, p = .03 • When we only add Age, change in χ2 = 5.5 and we test against χ2 with df of 1, which has p = .02

  26. Logistic Regression Model Building • What if we have a whole host of possible explanatory variables • We want to build a model which predicts whether a person will have a disease given a set of explanatory variables • SAME as multiple linear regression • Forward selection • Backward elimination • Stepwise • All subsets • Hierarchical

  27. How to know if a model is good • All about having a model which does a good job of appropriately classifying participants as having disease or not • In particular, model predicts how many people have disease and how many people don’t have the disease • The model can be • Correct in two ways • Correctly categorise a person who has a disease as having a disease • Correctly say no disease when no disease • Incorrect in two ways • Incorrectly categorise a person who has a disease as not having a disease • Incorrectly say no disease when disease

  28. Accuracy of model • Proportion of correct classifications • Number of correctdisease participants plus number of correctno disease participants divided by number of participants in total

  29. Sensitivity of model • Proportion of ‘successes’ correctly identified • Number of correctno disease participants divided by total number of no disease participants

  30. Specificity of model • Proportion of ‘failures’ correctly identified • Number of correctdisease participants divided by total number of disease participants

  31. Now…a real example • Startup, Makgekgenene and Webster (2007) looked at whether or not the subscales of the Posttraumatic Cognitions Inventory (PTCI) are good predictors of Posttraumatic Stress Disorder (PTSD) • Subscales are • Negative Cognitions About SELF • Negative Cognitions about the WORLD • Self-BLAME

  32. Descriptive Results • PTSD participants showed higher scores than non-PTSD in all three subscales variables

  33. Multiple Logistic Regression • Response variable: • whether or not the participant has PTSD • Explanatory variables: • Negative Cognitions About SELF • Negative Cognitions about the WORLD • Self-BLAME

  34. Let’s do the Logistic Regression • Open XYZ.sav in SPSS • Run the appropriate regression • What are the parameter estimates for our three explanatory variables? • Which of these are significant (at α= .05)? • What are the odds ratios for those that are significant? • Anything unusual?

  35. Self-BLAME • Self-BLAME has a negative odds ratio. • This means that increasing self-blame decreases the chance of having PTSD • This is surprising, especially since participants with PTSD showed higher Self-BLAME scores • What’s going on?

  36. Self-BLAME and SELF scales • Startup et al. (2007) explain this by stating that Self-BLAME is made up of both behavioural and characterological questions • SELF, however, may also tap into characterological aspects of self-blame • Behavioural self-blame can be considered adaptive. It may help avoid PTSD • Characterological self-blame, however, may be detrimental, and lead to PTSD

  37. Suppressor Effect • The relationship between SELF and PTSD is strong, and accounts for the negative relationship. This includes the effect of characterological self-blame. • The variation in PTSD that is left for Self-BLAME to account for is the positive aspect of the relationship between the Self-BLAME scores and PTSD. • The negative aspect of Self-BLAME scores has been suppressed (already accounted for by SELF). The positive aspect of Self-BLAME can now come out.

  38. Homework (haha) • Evaluate the model by looking at • Accuracy of model’s predictions • Sensitivity of model’s predictions • Specificity of model’s predictions

More Related