BIOL 582 Lecture Set 4 Model Selection and Comparison
BIOL 582 Overview • This is a little more lecture/presentation-like, compared to recent topics • Why compare alternative models? • Model comparison with likelihood ratio tests (nested models) • Model selection with information theory (nested or non-nested models) • Stepwise Regression (We will hit this harder in R) • Prerequisites, limitations, advantages, and disadvantages. For further information on these approaches see: Burnham and Anderson (2002) Model selection and inference. Bedrick and Tsai (1994) Model selection for multivariate regression in small samples. Biometrics 50:226-231.
BIOL 582 Why model selection/comparison? % Variation explained Model parameters (k) • Often, (especially in observational studies) it is unknown what ‘explanatory’ variables are important sources of variation in some measured response(s). • We can always ‘dump’ any or all explanatory variables into a model, but there are some problems with this. • Consider the following plot: * a = ‘adjusted’ • Increasing model parameters (k) results in: • Reduction of explained information relative to the model “size” • Potential loss of statistical power (i.e., eventually model df > error df)
BIOL 582 Why model selection/comparison? • The goal with model selection/comparison approaches is to evaluate two or more models with different sets of parameters to determine if one is ‘better’ (i.e., explains more variation in the response given certain criteria). Likelihood ratio tests (Statistical comparison of nested models) Information Theoretic Indices (Ranks models using parsimony principle) Stepwise (Regression) Procedure (Iterative process to add or remove model parameters) Cross Validation (Measures robustness of a particular model)
BIOL 582 Why model selection/comparison? • The goal with model selection/comparison approaches is to evaluate two or more models with different sets of parameters to determine if one is ‘better’ (i.e., explains more variation in the response given certain criteria). Likelihood ratio tests (Statistical comparison of nested models) Information Theoretic Indices (Ranks models using parsimony principle) Stepwise (Regression) Procedure (Iterative process to add or remove model parameters) Cross Validation (Measures robustness of a particular model) We will not cover this here. But it involves using a portion of data to estimate parameters and then use the parameters for the other portion of data, from which error is estimated. This can be done with many random permutations. A robust model will not be subject to large error fluctuations.
BIOL 582 Likelihood ratio tests • Although not explicitly stated, we have performed many likelihood ratio tests this semester. • Likelihood ratio tests involve statistically comparing one model (e.g., full model) with an alternative model that is nested within the full model (e.g., reduced model). • E.g., ANOVA with F Let SST be the total SS for the data that can be partitioned into SSM(the SS for predicted values) and SSE (the SS for residuals). In ANOVA, we can use F or a permutation test to determine if an effect is significant. This effect is removed from the reduced model. Thus, the difference in response variation explained between full and reduced models is the variation due to the effect. For nested models, SST = SSMFull+ SSEFull = SSMReduced + SSEReduced A difference in models is thus ∆SSM = H = SSMFull- SSMReduced= SSEReduced- SSEFull
BIOL 582 Likelihood ratio tests The right side of the equation is the multivariate normal probability density function (Σ is the error covariance matrix), and is maximized when (i.e., the exponent is 0). For univariate response data, Σis simply SSE/n Model likelihood: the likelihood of model parameters (β), given the data (Y) This latter part is a constant The smaller the error, the larger the likelihood of the model
BIOL 582 Likelihood ratio tests • Consider the following: • Therefore, is a likelihood ratio test (LRT)stat * * Note: it is common convention to multiply both sides by -2 and express the LRT as which approximately follows a Chi-square distribution with (Δk) df, where kiis the number of model parameters for the ith model.
Biology 571 Likelihood ratio tests: Example • Head shape variation of (live!) prairie rattlesnakes (Smith and Collyer, in 2008). • Many studies of viper morphology find significant sexual dimorphism in head size and measures of head shape. The putative hypothesis for sexual dimorphism is ‘ecological niche divergence’ because of different reproductive constraints. • Female vipers are live-bearers, and thus have long gestation periods. It is therefore believed that selection favors larger female body sizes (for gestation of young) and larger and wider heads (for acquisition of large prey). Whereas females are rather sedentary during gestation, males are on the move. Thus selection would favor smaller bodies and smaller heads (i.e., they acquire smaller prey more often). • PROBLEM 1: Most studies test sexual dimorphism but do not test variation due to sexual dimorphism against other sources of variation. • PROBLEM 2: Most previous studies considered head shape to be important for prey acquisition, but none really measured ‘shape’ as a response (i.e., they used head size as a surrogate).
Biology 571 Likelihood ratio tests: Example • Smith and Collyer, 2008 • 107 Adult snakes from three regions in North and South Dakotas captured and photographed • 6 anatomical and 7 semi-sliding landmarks (dorsal view of snakeheads) collected = 15 shape variables (p)
Biology 571 Likelihood ratio tests: Example • Smith and Collyer, 2008 • Models considered: • LRTs: • Full vs. Sex-reduced = Test of sexual dimorphism • Sex-reduced vs. Region-reduced = Test of regional variation • Sex-reduced vs SVL-reduced = Test of shape/size variation FULL: Shape = Sex + Region + Sex × Reg. + SVL Nested Within Previous Sex-reduced: Shape = Region + SVL Region-reduced: Shape = SVL SVL-reduced: Shape = Region
Biology 571 Likelihood ratio tests: Example • Results • Conclusions: • Sex is not an important source of head shape variation but Region and SVL are certainly important. • When a M-ANOVA was performed (with type III SS), sexual dimorphism was significant. • LRTs demonstrate that adding parameters for Sex and Sex × Region does not offer a significant improvement over a simpler model!
Biology 571 Likelihood ratio tests: Example 2-d Plane containing the three regional head shapes (East, West, Central) in the p = 15 multivariate shape space Hibernacula = sandy mammal burrows (large holes) Hibernacula = rocky buttes (small, narrow holes) Hence, regional variation shows an important biological signal. If sexual dimorphism was tested in the classical sense (e.g., M-ANOVA with type III SS), one might conclude it was meaningful. However, it is less meaningful than locally adapted head shape divergence.
BIOL 582 Likelihood ratio tests • Advantages • Provides a statistical assessment of model difference, based on importance of parameters that differ between models – ELEGANT! • Can be performed on any level of hierarchical structure (e.g., a model with parameters A, B, C, D, E, F can be compared to one with parameters A and E, one with just F, or one with B, C, and D in the same manner). • Disadvantages/Limitations • Models must be nested comparisons • As long as n:kp ratio is large, significant differences between models are likely, even for small effects.
BIOL 582 Information Theoretic model selection • “Information Theoretic” is actually an unfortunate description of these approaches: • Information Theory is an applied math discipline that has evolved from the classic work of Shannon [Shannon, C.E. (1948), A Mathematical Theory of Communication, Bell System Technical Journal, 27, pp. 379–423 & 623–656] with the goal of enabling as much data as possible to be reliably stored on a medium or communicated over a channel. • Information Theoretic model selection (ITMS) is a method of choosing a model that explains the most information with the fewest parameters, based on the Principle of Parsimony. • ITMS involves creating an index and ranking “candidate” models; the criteria for the index and model rankings involves some arbitrary choices.
BIOL 582 Information Theoretic model selection under-fit over-fit parsimonious • Principle of Parsimony This illustration is hypothetical: to measure sampling bias and variance of parameters, true population parameters would need to be known. Bias Variance (in k) Number of parameters (k) few many
BIOL 582 Information Theoretic model selection • Principle of Parsimony • In general, the goal is to explain the most information possible with the fewest parameters needed. • Akaike(1973, 1974) developed an “information criterion” for penalizing model likelihoods by a function of the number of parameters used to describe the models. This information criterion is now commonly referred to as Akaike’s information criterion (AIC) but several variants exist. • The most general form of AIC is (Bedrick and Tsai 1994): • However, because most use AIC with univariate response data only, it is generally simplified to Note: AIC is really
BIOL 582 Information Theoretic model selection • Why? • So, small values of AIC imply a better model than large values K Model likelihood part (larger model likelihoods are more negative) Dimensions of Σ matrix (increasing the number of response variables geometrically increases the number of covariances among the variables) Dimensions of β matrix (larger number means more parameters in model) 2 times the sum of these is the “parameter penalty” for the model likelihood
BIOL 582 Information Theoretic model selection • It is important to remember that the parameter penalty is arbitrary. • There are several variants, including: • AICc (second order AIC): • where K is the bracketed portion of the parameter penalty of AIC • QAIC (quasi-likelihood AIC): • where c is a dispersion index equal to χ2/df (i.e., corrects for over-dispersion) • BIC: (Bayesian AIC): • Plus many others (see Burnham and Anderson 2002)
BIOL 582 Non-nested models: ITMS Example • ITMS is not an improvement for LRT for nested models, but can be helpful for evaluating non-nested models, where LRTs are not possible. • Example, Collyer et al., 2007. • A field study was performed to address White Sands pupfish (Cyprinodon tularosa) body shape divergence when introduced from a saline river environment to brackish man-made ponds (refuges). • One problem: in the new environments, fish grew rapidly (much more so than the wild saline river population). Also, the growth was mostly because of increased tail growth (pond pupfish had smaller heads, equal body sizes, but much larger tails than pupfish from the source population). • Another problem: it is known that shape allometry (covariation of body shape and body size) is prevalent in this system! • Research question: Is it better to model divergence in body shape using a model with: a) no shape allometry, b) general shape/size allometry, or c) regional shape allometries (i.e., calculated separately for head, body, and tail growth).
BIOL 582 Non-nested models: ITMS Example Size = Centroid size (CS), the square root of summed squared distances from landmarks to centroid CS Total CS separately for head (H), body (B), and tail (T)
BIOL 582 Non-nested models: ITMS Example • Models of body shape variation: • Population differences only • Population differences and shape allometry (using overall body size) • Population differences and regional shape allometries (using separate measures of H, B, and T size) • * Models 2 and 3 are not nested; ΔAIC = AICi – min(AICi) Conclusion: A model with regional allometries explains much more shape variation despite the additional parameters needed for this model!
BIOL 582 Stepwise (Regression) Procedure • This is a topic that deserves more theoretical discussion than we will give here. • Stepwise procedures (often called stepwise regression, as it generally applies to linear models) involve using an algorithm to add or delete variables to a model • There are multiple ways to do this…. • Forward stepwise regression involves starting with just an intercept and adding independent variables – if they are valid – until the best model is found. • Backward stepwise regression involves starting with all independent variables and removing variables – if they are invalid – until the best model is found. • Forward and backward variable selection can produce different models, depending on the criterion used for adding or subtracting variables. Some algorithms use a combination of both.
BIOL 582 Stepwise (Regression) Procedure • This is a topic that deserves more theoretical discussion than we will give here. • Stepwise procedures (often called stepwise regression, as it generally applies to linear models) involve using an algorithm to add or delete variables to a model • Selection criteria • P to add or remove. This criterion performs a LRT with the current model and the next model (either adding or subtracting a parameter). If P is below some threshold (often 0.25), the algorithm makes a decision about the parameter, then moves to the next. Once the algorithm finishes – when no more changes are possible - then the resulting model is produced. • It should be noted that preliminary decisions can be overruled. For example, for a model that has variables A and B already, the selection criterion might choose to not include variable C. But maybe after variables D and G have been added, it would choose to include C. Stepwise procedures may perform lots of scans to allow this possibility.
BIOL 582 Stepwise (Regression) Procedure • This is a topic that deserves more theoretical discussion than we will give here. • Stepwise procedures (often called stepwise regression, as it generally applies to linear models) involve using an algorithm to add or delete variables to a model • Selection criteria • AIC. This criterion calculates AIC scores for competing models. Higher AIC scores are booted out, lower ones are kept in. • It should be noted that the previous method depends on arbitrary choice of P. This method does not. But AIC is an arbitrary criterion, itself. In either case, arbitrary decisions have to be made. This method does not involve a strict null hypothesis, and therefore, has some appeal. • Irrespective of whether one uses P or AIC, models must be nested! (Because inclusion of all independent variables means a large model with everything) • Examples are left for exercises using R.
BIOL 582 Final thoughts • Model selection is a good idea when it is not clear what potential sources of variation exist (i.e., maybe not needed for an experimental study where treatments are controlled). • Model selection might put you in the right ballpark, but it does not make inferences for you. • Never should one abandon, e.g., biological reasoning, and trust the outcome of naïve model selection (i.e., if you know that a certain parameter needs to be in the model – perhaps organism size – then neither choose a candidate model that lacks this parameter, nor trust an index ranking that excludes it as important). • Model selection is a tool in a large toolbox of statistical and analytical procedures. It is a tool that should probably be used early, followed by more rigorous methods. It SHOULD NOT be the focus of the data analysis.