Advanced Statistical Analysis: Regression, ANOVA, and Generalized Linear Models

Statistics……revisited

Standard statistics revisited Bolker

Standard statistics revisited:Simple Variance Structures

Standard statistics revisited

General linear models • Predictions are a linear function of a set of parameters. • Includes: • Linear models • ANOVA • ANCOVA • Assumptions: • Normally distributed, independent errors • Constant variance • Not to be confused with generalized linear models! • Distinction between factors and covariates.

Linear regression Standard R code: >lm.reg<-lm(Y~X) >summary(lm) >anova(lm.reg) Likelihood R code: >lmfun<-function(a, b, sigma) { Y.pred<-a+b*x -sum(dnorm(Y, mean=Y.pred, sd=sigma, log=TRUE)) }

Analysis of variance (ANOVA) Standard R code: >lm.onewayaov<-lm(Y~f1) >summary(lm.aov) >anova(lm.aov) # will give you an ANOVA table Likelihood R code: >aovfun<-function(a11, a12, sigma) { Y.pred<-c(a11,a12) -sum(dnorm(DBH, mean=Y.pred, sd=sigma, log=TRUE)) }

Analysis of covariance (ANCOVA) Standard R code: >lm.anc<-lm(Y~f*X) >summary(lm.anc) >str(summary(lm.anc)) Likelihood R code: >ancfun<-function(a11, a12, slope1, slope2, sigma) { Y.pred<-c(a11,a12)[f] + c(slope1, slope2)[f]*X -sum(dnorm(Y, mean=Y.pred, sd=sigma, log=TRUE)) }

Nonlinearlity: Non-linear least squares Uses numerical methods similar to those use in likelihood Standard R code: >nls(y~a*x^b, start=list(a=1,b=1) >summary(nls) >str(summary(lns)) Likelihood R code: >nlsfun<-function(a, b, sigma) { Y.pred<-a*x^b -sum(dnorm(Y, mean=Y.pred, sd=sigma, log=TRUE)) }

Generalized linear models • Assumptions: • Non-normal distributed errors ( but still independent and only certain kinds of non-normality) • Non-linear relationships are allowed but only if they have a linearizing transformation (the link function). • Linearizing transformations: • Non-normal distributed errors ( but still independent and only certain kinds of non-normality). These include the exponential family and are typically used with a specific linearizing function. • Poisson: loglink • Binomial: logit transfomation • Gamma: inverse Gaussian • Fit by iteratively reweighed least square methods: estimate variance associated with each point for each estimate of parameter(s). • Not to be confused with general linear models!

GML: Poisson regression Standard R code: >glm.pois<-glm(Y~X, family=poisson) >summary(gml.pois) Likelihood R code: >poisregfun=function(a,b) {Y.pred<-exp(a+b*X) -sum(dpois(Y, lambda=Y.pred, log=TRUE))}

GML: Logistic regression Standard R code: >glm2<-glm(y~x, family=binomial) >summary(gml2) Likelihood R code: >logregfun=function(a,b,N) {p.pred<-exp(a + b*X))/(1+exp(a + b*X)) -sum(dbinom(Y, size=N, prob=p.pred, log=TRUE))}

Generalized (non)linear least-squares models:Variance changes with a covariate or among groups Standard R code: >gls<-gls(y~1,weights=varIdent(form=~1|f) >summary(gls) Likelihood R code: >vardifffun=function(a, sd1,sd2) {sdval<-c(sd1,sd2)[f] -sum(dbinom(Y, mean=a, sd=sdval, log=TRUE)}

Standard statistics revisited: Complex Variance Structures

Complex error structures • Error structures are not independent • Complex likelihood functions • Includes: • Time series analysis • Spatial correlation • Repeated measures analysis Variance-covariance matrix x x Vector of means (pred) Vector of data

Complex error structures (x x Increasing variance General case Independent

Complex error structures • Variance/covariance matrix is symmetric so we need to specify at most n(n-1)/2 parameters. • V/C matrix must also be positive definite (logical), this translates to having a positive eigenvalue or positive diagonal values. • Select elements of matrix that define the error structure and ensure positive definite. • In this example, correlation drops off with the number ofd steps between sites.c

Complex error structures: An exampleSpatially-correlated errors R code: >rho=0.5 >m=matrix(nrow=5, ncol=5) >m<-rho^(abs(row(m)-col(m)) #OR# >m[abs(row(m)-col(m))==1]=rho mvlik<-function(a,b, rho) { pred.rad=a+b*dbh n=length(radius) m=diag(n) #generates diag matrix of n rows, n columns m[abs(row(m)-col(m))==1]=rho -dmvnorm(radius, pred.rad, Sigma=m, log=TRUE) } mle(mvlik, start=list(a=0.5, b=3,rho=0.5), method="L-BFGS-B", lower=0.001)

Mixed models & Generalized linear mixed models (GLMM) • Samples within a group (block, site) are equally correlated with each other. • Fixed effects: effects of covariates • Random effects: block, site etc. • GLMM’s are generalized linear models with random effects

Complex variance structures • So how do you incorporate all potential sources of variance? • Block effects • Individual effects (repeated measures includes both individual and temporal correlation) • Measurement vs. process error • …..

Bolker

Advanced Statistical Analysis: Regression, ANOVA, and Generalized Linear Models

Advanced Statistical Analysis: Regression, ANOVA, and Generalized Linear Models

Presentation Transcript

Chapter 6

Descriptive Statistics Univariate Statistics Chi Square ANOVA

The metal-insulator transition of VO 2 revisited

Essentials of 12 Lead ECG Interpretation

Statistics

Drug use and non-use: statistics

Introduction to Project Management session 2

5 th Annual Meeting of the Washington Group on Disability Statistics

Evidence Based Dentistry: Statistics 2

Chapter 1

Statistics

Oxidative Medicine Revisited

Chapter 3

Debating the next BIG thing in teaching statistics

Grade School Revisited: How To Multiply Two Numbers

Nuts and bolts of biostatistics

Statistics Review – Part I

Statistics

Grade School Revisited: How To Multiply Two Numbers

Statistics Workshop 2011