Week 7: General linear models

Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles

Generalized linear models (GLM) • Broad term that encompasses all types of regression models • Logistic and linear regression are the most common types of GLMs • Includes both continuous and binary response variables

Components of GLMs • Random component; the outcome or response variable-can be binary (yes/no)-continuous-count or rate • Systematic component: the exposure or explanatory variables-can be binary, continuous, or categorical-includes interaction terms

Components of GLMs • Link: specifies how the outcome and explanatory variables are linked-for continuous variables it is usually a direct or identity link-for binary variables it is usually a log link-for rates that follow a Poisson distribution it is usually a loglinear link

Back to the mathematical model • Y’ (known as Y prime) is the predicted value on the outcome variable (the random or outcome component) • β1 is the coefficient assigned through regression • X1 is the unit of the exposure variable (systematic component) • Y’ = A + β1X1 + β2X2 +β3X3 • The link function tells you how the two are related (linear or log relationship)

How do we know which model to use? • The model selection depends mostly on the distribution of the outcome variable • For continuous variables we use linear regression with the identity link • For binary variables we use logistic regression with the logit link • For count data and rates we use poisson regression with the loglinear link

Maximum likelihood model fitting • Most poisson regression models, like logistic regression models, use the maximum likelihood model to fit regression models • The log-likelihood is calculated based on predicted and actual outcomes A good model has a NON-significant LL • A goodness-of-fit chi-square is calculated (usually compares a constant-only model to the one you created)-2LL in null model - -2LL in your model with df = number of exposure variable • A good model has a significant goodness of fit

Cluster sampling • Sometimes we need to recruit research participants in groups or clusters • Examples include schools, hospitals, communities • Can be very efficient BUT people in one cluster may be more like each other than people in other clusterssample then not independent

Cluster sampling con’t • We need to adjust our confidence intervals to reflect the non-independent nature of the sample • Calculate an inter-class correlation coefficient • Need to take this into account when calculating sample size and designing studies • SPSS can’t do cluster analyses

For next week • Think about questions/issues related to data analysis

Week 7: General linear models