1 / 98

Logistic regression

Logistic regression. Recall the simple linear regression model: y = b 0 + b 1 x + e. where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model:

eeloise
Télécharger la présentation

Logistic regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic regression

  2. Recall the simple linear regression model: y = b0+ b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0+ b1x1+ b2x2+ … + + bpxp+ e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1, x2, … , xp .

  3. Now suppose the dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used

  4. Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum

  5. The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio

  6. Example: odds ratio, log odds ratio Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio

  7. The logisitic Regression Model Assumes the log odds ratiois linearly related to x. i. e. : In terms of the odds ratio

  8. The logisitic Regression Model Solving for p in terms x. or

  9. Interpretation of the parameter b0(determines the intercept) p x

  10. Interpretation of the parameter b1(determines when p is 0.50 (along with b0)) p when x

  11. Also when is the rate of increase in p with respect to x when p = 0.50

  12. Interpretation of the parameter b1(determines slope when p is 0.50 ) p x

  13. The data The data will for each case consist of • a value for x, the continuous independent variable • a value for y (1 or 0) (Success or Failure) Total of n = 250 cases

  14. Estimation of the parameters The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS

  15. Using SPSS to perform Logistic regression Open the data file:

  16. Choose from the menu: Analyze -> Regression -> Binary Logistic

  17. The following dialogue box appears Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.

  18. Here is the output The Estimates and their S.E.

  19. The parameter Estimates

  20. Interpretation of the parameter b0(determines the intercept) Interpretation of the parameter b1(determines when p is 0.50 (along with b0))

  21. Another interpretation of the parameter b1 is the rate of increase in p with respect to x when p = 0.50

  22. The Multiple Logistic Regression model

  23. Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc

  24. Multiple Logistic Regression an example In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight

  25. For n = 223 infants in prenatal ward the following measurements were determined • X1 = gestational Age (weeks), • X2 = Birth weight (grams) and • Y = presence of BPD

  26. The data

  27. The results

  28. Graph: Showing Risk of BPD vs GA and BrthWt

  29. DiscreteMultivariate Analysis Analysis of Multivariate Categorical Data

  30. Example 1 In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol

  31. Example 2 The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).

  32. The study involved a dichotomous response Y • Success (no major parole violation) or • Failure (returned to prison either as technical violators or with a new conviction) based on a one-year follow-up. The predictors of parole success included are: • type of committed offence (Person offense or Other offense), • Age (25 or Older or Under 25), • Prior Record (No prior sentence or Prior Sentence), and • Drug or Alcohol Dependency (No drug or Alcohol dependency or Drug and/or Alcohol dependency).

  33. The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses. • The second part of the data was set aside for a validation study of the model to be fitted in the first part.

  34. Table

  35. Analysis of a Two-way Frequency Table:

  36. Frequency Distribution(Serum Cholesterol and Systolic Blood Pressure)

  37. Joint and Marginal Distributions(Serum Cholesterol and Systolic Blood Pressure) The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.

  38. Conditional Distributions( Systolic Blood Pressure given Serum Cholesterol ) The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.

  39. Conditional Distributions(Serum Cholesterol given Systolic Blood Pressure)

  40. GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol

  41. Notation: Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.

  42. Different Models The Multinomial Model: Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters pij

  43. The Product Multinomial Model: Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters pj|i

  44. The Poisson Model: In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let mij denote the mean of xij.

  45. Independence

  46. Multinomial Model if independent and The estimated expected frequency in cell (i,j) in the case of independence is:

  47. The same can be shown for the other two models – the Product Multinomial model and the Poisson model namely The estimated expected frequency in cell (i,j) in the case of independence is: Standardized residuals are defined for each cell:

  48. The Chi-Square Statistic The Chi-Square test for independenceReject H0: independence if

  49. TableExpected frequencies, Observed frequencies, Standardized Residuals c2 = 20.85 (p = 0.0133)

More Related