1 / 13

Model Selections and Comparisons

Model Selections and Comparisons. (Categorical Data Analysis, Ch 9.2). Yumi Kubo Alvin Hsieh. Model 2. Model 1. Survey Data. 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio 2276 students in the last year of high school (nonurban area)

lesley
Télécharger la présentation

Model Selections and Comparisons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Selections and Comparisons (Categorical Data Analysis, Ch 9.2) Yumi Kubo Alvin Hsieh Model 2 Model 1

  2. Survey Data • 1992 by Wright State University School of Medicine and United Health Services in Dayton, Ohio • 2276 students in the last year of high school (nonurban area) • We add more dimensions to 8.2.4 • Variables: Alcohol (A), Cigarette (C), Marijuana (M) • Added variables: Gender (G), Race (R)

  3. Association Graphs (Definitions) • association graph - set of vertices, each vertex is a variable • edge - conditional association between 2 variables • path - sequence of edges leading from one variable to another

  4. Association Graphs (Saturated) Variable Path M M G G R R C A Conditional Association

  5. Association Graphs (Reduced) M G R C A

  6. Data Set Marijuana Use ========================================================== Race = White Race = Other ============================ ========================== Female Male Female Male Alcohol Cigarette yes no yes no yes no yes no yes yes 405 268 453 228 23 23 30 19 no 13 218 28 201 2 19 1 18 no yes 1 17 1 17 0 1 1 8 no 1 117 1 133 0 12 0 17

  7. SAS Program Too large to place here: Go to survey.sas

  8. R Program Original codes (modified below): http://math.cl.uh.edu/~thompsonla/RCode.txt survey<-data.frame(expand.grid(cigarette=c("Yes","No"), alcohol=c("Yes","No"), marijuana=c("Yes","No"), gender=c("female","male"), race=c("white","other") ), count=c(405,13,1,1,268,218,17,117,453,28,1,1,228,201,17, 133,23,2,0,0,23,19,1,12,30,1,1,0,19,18,8,17)) library(MASS) fit.GR<-glm(count~ . + gender*race, data=survey, family=poisson) # mutual independence + GR fit.homog.assoc<-glm(count~ .^2, data=survey, family=poisson) # homogeneous association fit.3fact<-glm(count~ .^3, data=survey, family=poisson) # all three factor terms summary(res<-stepAIC(fit.homog.assoc, scope= list(lower = ~ + cigarette + alcohol + marijuana + gender*race), direction="backward")) fit.AC.AM.CM.AG.AR.GM.GR.MR<-res fit.AC.AM.CM.AG.AR.GM.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR.MR, ~. - marijuana:race) fit.AC.AM.CM.AG.AR.GR<-update(fit.AC.AM.CM.AG.AR.GM.GR, ~. - marijuana:gender)

  9. R Program (P-values) 1-pchisq((15.8-15.3),1) 1-pchisq((16.7-15.8),1) 1-pchisq((19.9-16.7),1) 1-pchisq((28.8-19.9),1) 1-pchisq((40.3-28.8),1)

  10. Model Selection • Select an Alpha level (default to use 0.05) • Look at the P-values of the model • Use (in R): 1-pchisq(G2, df) • Stop selecting once you reach the Alpha in (1) • Model 1: G+R+A+C+M+GR • Model 2: G+R+A+C+M+GR+(all pairs)

  11. Model Selection (Continued) Model 3: G+R+A+C+M+GR+(all pairs)+(all 3 factors) Model 4g: lowest change in G2, taking out CR Model 5: lowest change in G2, taking out CG Model 6: lowest change in G2, taking out MR Model 7: lowest change in G2, taking out GM Consider: A+C+M+AC+AM+CM

  12. Goodness-of-Fit tests(Table 9.2)

  13. Thank You! Any Questions???

More Related