1 / 32

General Linear Models; Generalized Linear Models

General Linear Models; Generalized Linear Models. Hal Whitehead BIOL4062/5062. Transformations Analysis of Covariance General Linear Models Generalized Linear Models Non-Linear Models. Common Transformations. Logarithmic: X’ =Log( X ) Most common, morphometrics, allometry

Olivia
Télécharger la présentation

General Linear Models; Generalized Linear Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. General Linear Models;Generalized Linear Models Hal Whitehead BIOL4062/5062

  2. Transformations • Analysis of Covariance • General Linear Models • Generalized Linear Models • Non-Linear Models

  3. Common Transformations • Logarithmic: X’=Log(X) • Most common, morphometrics, allometry • Squareroot: X’=√X • Counts, Poisson distributed • X’=√(X+0.5) if counts include zeros • Arcsine-squareroot: X’=arcsine(√X) • Proportions (or percentages /100) • Box-Cox • General transformation

  4. Regression and ANOVA • Multiple regression: Y = β0 + β1·X1 + β2·X2 + β3·X3 + … + Error {X’s are continuous variables} • ANOVA: Y = γ0 + γ1 (Z1)+ γ2(Z2) + γ3(Z3) + … + Error {Z’s are categorical variables, defining groups}

  5. Analysis of Covariance(mixture of ANOVA and regression) Y = β0+β1·X1+β2·X2+…+γ1(Z1)+γ2(Z2)+... +Error {X’s are continuous variables} {Z’s are categorical variables, defining groups} • Important assumption:Parallelism: β’s the same for all groups • Estimate β’s and γ’s using least squares

  6. Analysis of Covariance • Data: • Catch rates of sperm whales (per whaling day) by Yankee whalers from logbooks of Yankee whalers off Galapagos Islands 1830-1850 • Questions: • Was there a significant change in catch rate over this period? • Was there a significant seasonal pattern?

  7. Analysis of Covariance • Model: Catch (m,t) = β0 + β1·t + γ(m) + Error t =1830-1850 [continuous] m= Jan-Feb, Mar-Apr, …, Nov-Dec

  8. Analysis of Covariance • Model: Catch (m,t) = β0 + β1·t + γ(m) + Error • Parameter estimates: β0 = 4.528 [constant] β1 =-0.002 [change/yr] γ(Jan-Feb) = 0.016 γ(Mar-Apr) = 0.013 γ(May-Jun) =-0.038 γ(Jul-Aug) =-0.020 γ(Sep-Oct) = 0.000 γ(Nov-Dec) = 0.000

  9. Analysis of Covariance • Model: Catch (m,t) = β0 + β1·t + γ(m) + Error • Analysis of Variance Table: Source SS df MS F-ratio P YEAR 0.014 1 0.014 3.653 0.061 MONTH 0.034 5 0.007 1.782 0.131 Error 0.220 57 0.004

  10. Analysis of Covariance Durbin-Watson D Statistic: 1.923 First Order Autocorrelation: 0.034

  11. General Linear Model:Analysis of Covariance plus Interactions Y = β0 + β1·X1 + β2·X2 + … + γ1 (Z1) + γ2 (Z2) + … + β12·X1·X2 + … + γ12 (Z1, Z2) + … + α12 (Z1)·X1 + … + Error {X’s are continuous variables} {Z’s are categorical variables, defining groups}

  12. Characteristics of General Linear Models • The response Y has a normal distribution with vector mean μ and SD σ2. • A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). • The model equates the two as: μ = X·b

  13. General Linear Models • Coefficients (β’s, γ’s, α’s), and fit of model (σ² or r²) estimated using least squares • Subsets of predictor variables may be selected using stepwise methods, etc. • Beware: • Collinearity • Empty or nearly-empty cells (combinations of categorical variables with few units)

  14. General Linear Model • Data: • Movements of sperm whales (displacement per 12-hr) off Galapagos Islands with year, clan, and shit rate • Questions: • Are movements of sperm whales affected by year, clan, shit rate or combinations of them?

  15. General Linear Model Potential X variables:Year (Categorical: 1987 and 1989) Clan (Categorical: ‘Plus-one’ and ‘Regular’) Shit-rate (Continuous, Arcsine-Squareroot transform) Year*Clan Year*Shit-rate Clan*Shit-rate

  16. General Linear Model X variables selected by stepwise selection (P-to-enter = 0.15/ P-to-remove = 0.15) Backward Forward Year Year Clan Clan Shit-rate Shit-rate Year*Clan Year*Clan Year*Shit-rate Year*Shit-rate Clan*Shit-rate Clan*Shit-rate

  17. Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan General Linear Model

  18. Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan General Linear ModelWhy two “best models”? 1987 1989

  19. Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan General Linear ModelWhich is “best”? 1987 1989 r²=0.347 1 d.f. r²=0.264 2 d.f.

  20. General Linear Models • The response Y has a normal distribution with vector mean μ and SD σ2. • A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). • The model equates the two as: μ = X·b

  21. Generalized Linear Models • The response Y has a distribution that may be normal, binomial, Poisson, gamma, or inverse Gaussian, with parameters including amean µ. • A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). • A link function f defines the link between the two as : f(μ) = X·b

  22. Generalized linear models • Examine assumptions using residuals • Examine fit using “deviance”: • a generalization of the residual sum of squares • twice difference of log-likelihoods of model in question and full model • fits of different models can be compared • Related to AIC

  23. Generalized Linear Models:can fit non-linear relationships using ‘link functions’ and can consider non-normal errors MATLAB: glmdemo

  24. Proportion of sexually-mature animals at different weights MATLAB: glmdemo

  25. Two problems with linear regression:1) probabilities <0 and >12) clearly non-linear MATLAB: glmdemo

  26. Polynomial Regression better, but also:1) probabilities <0 and >12) inflections are not real MATLAB: glmdemo

  27. Instead fit “logistic regression”using generalized linear model and binomial distribution Y= 1/(1+e β0+β1·X) MATLAB: glmdemo

  28. Compare two generalized linear models Y= 1/(1+e β0+β1·X) Y= 1/(1+e β0+β1·X +β2·X·X) Difference in deviance =0.70; P=0.40 MATLAB: glmdemo

  29. Examine assumptions using residuals MATLAB: glmdemo

  30. Making predictions: MATLAB: glmdemo

  31. Non-linear models, e.g.Y= c + EXP(ß0 + ß1·X) + EY= ß0 + ß1·X·[X>XK] + E • More general than generalized linear models • But harder to fit: • iterative process • may not converge • non-unique solution • harder to compare

  32. Summary:Methods with One Dependent Variable Simple Linear Regression One-way ANOVA Multiple Linear Regression Multi-way ANOVA Analysis of Covariance General Linear Model Generalized Linear Model Non-Linear Model Increasing Complexity

More Related