100 likes | 217 Vues
This overview delves into Generalized Linear Models (GLMs), a broad class of regression models applicable to both continuous and binary response variables. GLMs encompass popular methods such as logistic and linear regression. Key components include the random and systematic components, linked through various mathematical relationships. Model selection is driven by the outcome variable's distribution, and fitting is typically achieved through maximum likelihood methods. Considerations for cluster sampling and its implications on analysis are discussed, preparing participants for next week’s data analysis inquiries.
E N D
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles
Generalized linear models (GLM) • Broad term that encompasses all types of regression models • Logistic and linear regression are the most common types of GLMs • Includes both continuous and binary response variables
Components of GLMs • Random component; the outcome or response variable-can be binary (yes/no)-continuous-count or rate • Systematic component: the exposure or explanatory variables-can be binary, continuous, or categorical-includes interaction terms
Components of GLMs • Link: specifies how the outcome and explanatory variables are linked-for continuous variables it is usually a direct or identity link-for binary variables it is usually a log link-for rates that follow a Poisson distribution it is usually a loglinear link
Back to the mathematical model • Y’ (known as Y prime) is the predicted value on the outcome variable (the random or outcome component) • β1 is the coefficient assigned through regression • X1 is the unit of the exposure variable (systematic component) • Y’ = A + β1X1 + β2X2 +β3X3 • The link function tells you how the two are related (linear or log relationship)
How do we know which model to use? • The model selection depends mostly on the distribution of the outcome variable • For continuous variables we use linear regression with the identity link • For binary variables we use logistic regression with the logit link • For count data and rates we use poisson regression with the loglinear link
Maximum likelihood model fitting • Most poisson regression models, like logistic regression models, use the maximum likelihood model to fit regression models • The log-likelihood is calculated based on predicted and actual outcomes A good model has a NON-significant LL • A goodness-of-fit chi-square is calculated (usually compares a constant-only model to the one you created)-2LL in null model - -2LL in your model with df = number of exposure variable • A good model has a significant goodness of fit
Cluster sampling • Sometimes we need to recruit research participants in groups or clusters • Examples include schools, hospitals, communities • Can be very efficient BUT people in one cluster may be more like each other than people in other clusterssample then not independent
Cluster sampling con’t • We need to adjust our confidence intervals to reflect the non-independent nature of the sample • Calculate an inter-class correlation coefficient • Need to take this into account when calculating sample size and designing studies • SPSS can’t do cluster analyses
For next week • Think about questions/issues related to data analysis