1 / 31

Multilevel Modeling-Logistic

Multilevel Modeling-Logistic . Schedule. 3/18/2013 = Spring Break 3/25/2013 = Longitudinal Analysis 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal). Introduction. Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome.

reid
Télécharger la présentation

Multilevel Modeling-Logistic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilevel Modeling-Logistic Raul Cruz-Cano, HLTH653 Spring 2013

  2. Schedule • 3/18/2013 = Spring Break • 3/25/2013 = Longitudinal Analysis • 4/1/2013 = Midterm (Exercises 1-5, not Longitudinal) Raul Cruz-Cano, HLTH653 Spring 2013

  3. Introduction • Just as with linear regression, logistic regression allows you to look at the effect of multiple predictors on an outcome. • Consider the following example: 15- and 16-year-old adolescents were asked if they have ever had sexual intercourse. • The outcome of interest is intercourse. • The predictors are race (white and black) and gender (male and female). Example from Agresti, A. Categorical Data Analysis, 2nd ed. 2002. Raul Cruz-Cano, HLTH653 Spring 2013

  4. Here is a table of the data: Raul Cruz-Cano, HLTH653 Spring 2013

  5. Data Set Intercourse DATA intercourse; INPUT white male intercourse count; DATALINES; 1 1 1 43 1 1 0 134 1 0 1 26 1 0 0 149 0 1 1 29 0 1 0 23 0 0 1 22 0 0 0 36 ; RUN; Raul Cruz-Cano, HLTH653 Spring 2013

  6. SAS: PROCLOGISTICDATA = intercourse descending; weight count; MODEL intercourse = white male/rsquarelackfit; RUN; • “descending” models the probability that intercourse = 1 (yes) rather than = 0 (no). • “rsquare” requests the R2 value from SAS; it is interpreted the same way as the R2 from linear regression. • “lackfit” requests the Hosmer and Lemeshow Goodness-of-Fit Test. This tells you if the model you have created is a good fit for the data. Raul Cruz-Cano, HLTH653 Spring 2013

  7. SAS Output: R2 Raul Cruz-Cano, HLTH653 Spring 2013

  8. Interpreting the R2 value The R2 value is 0.9907. This means that 99.07% of the variability in our outcome (intercourse) is explained by including gender and race in our model. Raul Cruz-Cano, HLTH653 Spring 2013

  9. PROC LOGISTIC Output The odds of having intercourse is 1.911 times greater for males versus females.

  10. Hosmer and Lemeshow GOF Test

  11. H-L GOF Test The Hosmer and Lemeshow Goodness-of-Fit Test tests the hypotheses: Ho: the model is a good fit, vs. Ha: the model is NOT a good fit With this test, we want to FAIL to reject the null hypothesis, because that means our model is a good fit (this is different from most of the hypothesis testing you have seen). Look for a p-value > 0.10 in the H-L GOF test. This indicates the model is a good fit. In this case, the pvalue = 0.2419, so we do NOT reject the null hypothesis, and we conclude the model is a good fit. Raul Cruz-Cano, HLTH653 Spring 2013

  12. Model Selection in SAS • Often, if you have multiple predictors and interactions in your model, SAS can systematically select significant predictors using forward selection, backwards selection, or stepwise selection. • In forward selection, SAS starts with no predictors in the model. It then selects the predictor with the smallest pvalue and adds it to the model. It then selects another predictor from the remaining variables with the smallest pvalue and adds it to the model. It continues doing this until no more predictors have pvalues less than 0.05. • In backwards selection, SAS starts with all of the predictors in the model and eliminates the non-significant predictors one at a time, refitting the model between each elimination. It stops once all the predictors remaining in the model are statistically significant. Raul Cruz-Cano, HLTH653 Spring 2013

  13. Forward Selection in SAS We will let SAS select a model for us out of the three predictors: white, male, white*male. Type the following code into SAS: PROCLOGISTICDATA = intercourse descending; weight count; MODEL intercourse = white male white*male/selection = forward lackfit; RUN; Raul Cruz-Cano, HLTH653 Spring 2013

  14. Output from Forward Selection: “white” is added to the model

  15. “male” is added to the model

  16. No more predictors are found to be statistically significant

  17. The Final Model:

  18. Hosmer and Lemeshow GOF Test: The model is a good fit

  19. Multilevel Modeling (refresher) • Multi-level modeling takes into account the hierarchical structure of the data (e.g. decedents clustered within occupations as in our data). • Such data structure is subject to intra-class correlation, whereby individuals within the same group are more alike than individuals across groups. • Analysis that ignores this intra-class correlation may underestimate the standard error of the regression coefficient of the aggregate risk factor, leading to overestimation of the significance of the risk factor. • To illustrate the above point, we conducted our analysis using two approaches Raul Cruz-Cano, HLTH653 Spring 2013

  20. 1st Approach • Fit a multiple logistic regression model on the combined data with PROC LOGISTIC. • The dependent variable is death from injury (yes/no); • the risk factor of interest is exposure to hazardous equipment at work (high/low); • confounders included are gender, race (white/black/other), age (continuous, centered) and a quadratic term for age. • This model ignores the hierarchical structure of the data, and treats aggregate exposure as if it was measured at individual level. The model is expressed by the following equation Raul Cruz-Cano, HLTH653 Spring 2013

  21. 1st Approach • pijis the expected probability of death from injury for the jth individual of the ith occupation conditional on the predictor variables proc logistic data=noms.combined descending; class exposure gender race; model injury = exposure gender race age age*age; run; Raul Cruz-Cano, HLTH653 Spring 2013

  22. Multilevel Example • Allison, 2006 • The sample consists of 1151 girls from the National Longitudinal Survey of Youth who were interviewed annually for nine years, beginning in 1979. For this initial example, we’ll only use data from year 1 and year 5. • The response variable POV has a value of 1 if the girl’s household was in poverty (as defined by U.S. federal standards) in each of the years, otherwise 0. • The predictor variables are: • AGE: Age in years at the first interview • BLACK: 1 if respondent is black, otherwise 0 • MOTHER: 1 if respondent currently had a least one child, otherwise 0 • SPOUSE: 1 if respondent is currently living with a spouse, otherwise 0 • INSCHOOL: 1 if respondent is currently enrolled in school, otherwise 0 • HOURS: Hours worked during the week of the survey Raul Cruz-Cano, HLTH653 Spring 2013

  23. Multilevel Example • 5755 observations, five for each of the 1151 girls • The CLASS statement declares YEAR to be a categorical variable, with the highest year (year 5) being the reference category. • The STRATA statement says that each girl is a separate stratum, which has the consequence of grouping together the five observations for each girl in the process of constructing the likelihood function. PROC LOGISTIC DATA=teenyrs5 DESC; CLASS year; MODEL pov = year mother spouse inschool hours; STRATA id; RUN; In PROC LOGISTIC there is no CLUSTER, just CLASS and STRATA

  24. Multilevel Example • In the “Analysis of Maximum of Likelihood Estimates” panel, we see that motherhood and school enrollment increase the risk of poverty while living with a husband and working more hours reduce the risk. • The last panel gives the odds ratios. • We see that motherhood increases the odds of poverty by an estimated 79 percent. • Living with a husband cuts the odds approximately in half. • Each additional hour of employment per week reduces the odds by about 2 percent. • Keep in mind that these estimates control for all stable characteristics of the girls, including such things as race, intelligence, place of birth and parent’s education Raul Cruz-Cano, HLTH653 Spring 2013

  25. Multilevel Example • The next model, for example, includes the interaction between MOTHER and BLACK. PROC LOGISTIC DATA=teenyrs5 DESC; CLASS year; MODEL pov = year mother spouse inschool hours mother*black; STRATA id; RUN; Raul Cruz-Cano, HLTH653 Spring 2013

  26. Multilevel Example • The interaction is statistically significant at the .05 level. • For nonblack girls, the effect of motherhood is to increase the odds of poverty by a factor of exp(.9821)=2.67. • For black girls, on the other hand, the effect of motherhood is to increase the odds of poverty by a factor of exp(.9821-.5989)= 1.47. • Thus, motherhood has a larger effect on poverty status among nonblack girls than among black girls. Raul Cruz-Cano, HLTH653 Spring 2013

  27. SAS Weigted Example • A random sample • 300 students from each of the classes: freshman, sophomore, junior, and senior classes. • data WebSurvey; • format Class Class. Design Design. Rating Rating. ; • do Class=1 to 4; • do Design=1 to 3; • do Rating=1 to 5; • input Count @@; • output; • end; • end; • end; • datalines; • 10 34 35 16 15 8 21 23 26 22 5 10 24 30 21 • 1 14 25 23 37 11 14 20 34 21 16 19 30 23 12 • 19 12 26 18 25 11 14 24 33 18 10 18 32 23 17 • 8 15 35 30 12 15 22 34 9 20 2 34 30 18 16 • ; • run; • data WebSurvey; • set WebSurvey; • if Class=1 then Weight=3734/300; • if Class=2 then Weight=3565/300; • if Class=3 then Weight=3903/300; • if Class=4 then Weight=4196/300; • run; • proc format; • value Design 1='A' 2='B' 3='C'; • value Rating 1='dislike very much' • 2='dislike' • 3='neutral' • 4='like' • 5='like very much'; • value Class 1='Freshman' 2='Sophomore' • 3='Junior' 4='Senior'; • run; • data Enrollment; • format Class Class.; • input Class _TOTAL_; • datalines; • 1 3734 • 2 3565 • 3 3903 • 4 4196 • ; • run; Raul Cruz-Cano, HLTH653 Spring 2013

  28. PROC Logistic • proc logistic data=WebSurvey; • freq Count; • class Design; • model Rating (ref='neutral') = Design ; • weight Weight; • run; Raul Cruz-Cano, HLTH653 Spring 2013

  29. PROC surveylogistic If you want “better” results.. • proc surveylogistic data=WebSurvey total=Enrollment; • freq Count; • class Design; • model Rating (ref='neutral') = Design; • stratum Class; • weight Weight; • run; For the Ratings for Design B vs. Design C compare The point estimete 95% Confidence Interval Raul Cruz-Cano, HLTH653 Spring 2013

  30. More to come… • There are also mixed effects logistic models…which will be studied later Raul Cruz-Cano, HLTH653 Spring 2013

  31. References • Paul D. Allison, Fixed Effects Regression Methods In SAS, SUGI 31 Proceedings (2006), paper 184-31 • Jia Li, Toni Alterman, James A. Deddens, Analysis of Large Hierarchical Data with Multilevel Logistic Modeling Using PROC GLIMMIX In SAS, SUGI 31 Proceedings (2006), paper 151-31 • David L. Cassell, (2006) “Wait Wait, Don't Tell Me… You're Using the Wrong Proc! SUGI31. Paper 193-31. Raul Cruz-Cano, HLTH653 Spring 2013

More Related