180 likes | 289 Vues
In this overview of Week 12 of the Applied Statistics course, students will delve into advanced topics in multiple linear regression. The session covers key concepts such as quadratic and dummy variable models, transformations, collinearity assessment using Variance Inflation Factor (VIF), and approaches to effective model building including stepwise and best-subset regression. Students will also engage in homework assignments focused on multiple regression analysis, enhancing their understanding of linear models with multiple dependent variables.
E N D
ENGR 610Applied StatisticsFall 2007 - Week 12 Marshall University CITE Jack Smith
Overview for Today • Review Multiple Linear Regression, Ch 13 (1-5) • Go over problem 13.62 • Multiple Linear Regression, Ch 13 (6-11) • Quadratic model • Dummy-variable model • Using transformations • Collinearity (VIF) • Modeling building • Stepwise regression • Best sub-set regression with Cp statistic • Homework assignment
Multiple Regression • Linear model - multiple dependent variables • Yi = 0 + 1X1i + … + jXji + … + kXki + i • Xji = value of independent variable • Yi = observed value of dependent variable • 0 = Y-intercept (Y at X=0) • j = slope (Y/Xj) • i = random error for observation i • Yi’ = b0 + b1Xi + … + bkXki (predicted value) • The bj’s are called the regression coefficients • ei = Yi - Yi’ (residual) • Minimize ei2 for sample with respect to all bj j = 1,k
Partitioning of Variation • Total variation • Regression variation • Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R2Y.12..k = SSR/SST Standard Error of the Estimate
Adjusted R2 • To account for sample size (n) and number of dependent variables (k) for comparison purposes
Residual Analysis • Plot residuals vs • Yi’ (predicted values) • X1, X2,…,Xk • Time (for autocorrelation) • Check for • Patterns • Outliers • Non-uniform distribution about mean • See Figs 12.18-19, p 597-8
F Test for Multiple Regression • F = MSR / MSE • Reject H0 if F > FU(,k,n-k-1) [or p<] • k = number of independent variables • One-Way ANOVA Summary
AlternateF-Test Compared to FU(,k,n-k-1)
t Test for Slope H0: j = 0 See output from PHStat Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom
Confidence and Prediction Intervals • Confidence Interval Estimate for the Slope • Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response • Beyond the scope of this text
Partial F Tests • Significance test for contribution from individual independent variable • Measure of incremental improvement • All others already taken into account • Fj = SSR(Xj|{Xi≠j}) / MSE SSR(Xj|{Xi≠j}) = SSR - SSR({Xi≠j}) • Reject H0 if Fj > FU(,1,n-k-1) [or p<] • Note: t2 (,n-k-1) = FU(,1,n-k-1)
Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637
Quadratic Curvilinear Regression Model • Yi = 0 + 1X1i + 2X1i2 + i • Treat the X2 term just like any other independent variable • Same R2, F tests, t tests, etc. • Generally need linear term as well
Dummy-Variable Models • Treatment of categorical variables • Each possible value represented by a dummy variable with value of 0 or 1 • Treat added terms like any other terms • Often confounded with other variables, so model may need interaction terms • Add interaction term and perform partial F test and t test for added term
Using Transformations • Square-root • Multiplicative - logY-logX model • Exponential - logY model • Others • Higher polynomials • Trigonometric functions • Inverse
Collinearity (VIF) • Test for linearly dependent variables • VIF - Variance Inflationary Factor • VIFj = 1/(1-Rj2) • Rj = coefficient of multiple determination of variable Xj with all other X variables • VIF > 5 suggests linear dependence (R2 > 0.8) • Full treatment involves analysis of correlation (covariance) matrix, such as • Principle Component Analysis (PCA) • To determine dimensionality and orthogonal factors • Factor Analysis (FA) • To determine rotated factors
See flow chart in text, Fig 13.25 (p 663) Model Building • Stepwise regression • Add or delete one variable at a time • Use partial F and/or t tests (p > 0.05) • Best-subset regression • Start with model including all variables (< n/10) • Eliminate highest variables with VIF > 5 • Generate all models with remaining variables (T) • Select best models using R2 and Cp statistic • Cp = (1-Rk2)(n-T)/(1-RT2) - (n-2(k+1)) • Cp ≤ k+1 • Evaluate each term using t test • Add interaction term, transformed variables, and higher order terms based on residual analysis
Homework • Work and hand in Problem13.63 • Fall break (Thanksgiving) – 11/22 • Review session – 11/29 (“dead” week) • “Linear Regression”, Ch 12-13 • Exam #3 • Linear regression (Ch 12-13) • Take-home • Due by 12/6 • Final grades due by 12/13