1 / 16

Multiple Regression

Multiple Regression. Predicting a response with multiple explanatory variables. Assumptions. Sample representative Error is random with mean of zero Independent variables measured without error Independent variables are linearly independent ( multicollinearity ) Errors uncorrelated

tamah
Télécharger la présentation

Multiple Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Regression Predicting a response with multiple explanatory variables

  2. Assumptions • Sample representative • Error is random with mean of zero • Independent variables measured without error • Independent variables are linearly independent (multicollinearity) • Errors uncorrelated • Variance is constant (homoscedasticity

  3. Data/Distribution Issues • Consideration of outlier values – accurate estimates may require eliminating them or using robust approaches • Non-normal distributions may require transformation • Plot response against each explanatory variable

  4. Modeling • We want to obtain a model that fits the response (predicts) variable with as few variables as possible • R2 measures proportion of variability accounted for by the explanatory variables • Adjusted R2 takes the number of explanatory variables into account

  5. Modeling Methods • General approach is to include variables theoretically relevant to predicting the response • Gradually remove variables that are not significant and compare difference between models for significance • Automatic stepwise methods • Forward and backwards

  6. A Simple Example • Kalahari data includes site area (LMS), the number of days the site was occupied and the number of people who occupied it • Rcmdr – Statistics | Fit models | Linear Model

  7. Two models • Model 1: LMS ~ People + Days • Model 2: LMS ~ People * Days • LMS ~ People + Days + People * Days • Check significance of slopes • Compare models for significant difference

  8. > LinearModel.1 <- lm(LMS ~ People +Days, data=Kalahari) > summary(LinearModel.1) Call: lm(formula = LMS ~ People + Days, data = Kalahari) Residuals: Min 1Q Median 3Q Max -84.067 -8.387 1.395 19.792 60.233 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -94.968 37.051 -2.563 0.0249 * People 12.276 2.062 5.953 6.68e-05 *** Days 5.885 1.992 2.954 0.0121 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 37.92 on 12 degrees of freedom Multiple R-squared: 0.8001, Adjusted R-squared: 0.7668 F-statistic: 24.02 on 2 and 12 DF, p-value: 6.377e-05

  9. > LinearModel.2 <- lm(LMS ~ People*Days, data=Kalahari) > summary(LinearModel.2) Call: lm(formula = LMS ~ People * Days, data = Kalahari) Residuals: Min 1Q Median 3Q Max -85.921 -11.310 5.595 18.593 35.520 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.1301 63.9905 -0.080 0.938 People 6.3835 4.0219 1.587 0.141 Days -6.6859 7.7606 -0.862 0.407 People:Days 0.8111 0.4862 1.668 0.123 Residual standard error: 35.38 on 11 degrees of freedom Multiple R-squared: 0.8405, Adjusted R-squared: 0.797 F-statistic: 19.32 on 3 and 11 DF, p-value: 0.0001083

  10. > anova(LinearModel.1, LinearModel.2) Analysis of Variance Table Model 1: LMS ~ People + Days Model 2: LMS ~ People * Days Res.Df RSS Df Sum of Sq F Pr(>F) 1 12 17252 2 11 13768 1 3483.9 2.7834 0.1234

  11. Darl Points • Create subset of DartPoints containing only the Darl Points • Model 1: Length ~ Width + Thickness • Model 2: Length ~ Width * Thickness

  12. > LinearModel.4 <- lm(Length ~ Width +Thick, data=Darl) > summary(LinearModel.4) Call: lm(formula = Length ~ Width + Thick, data = Darl) Residuals: Min 1Q Median 3Q Max -9.297 -3.214 -1.250 4.592 7.449 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.369 6.639 0.959 0.3470 Width 1.178 0.453 2.601 0.0157 * Thick 2.219 1.023 2.168 0.0403 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.652 on 24 degrees of freedom Multiple R-squared: 0.5418, Adjusted R-squared: 0.5037 F-statistic: 14.19 on 2 and 24 DF, p-value: 8.554e-05

  13. > LinearModel.5 <- lm(Length ~ Width * Thick, data=Darl) > summary(LinearModel.5) Call: lm(formula = Length ~ Width * Thick, data = Darl) Residuals: Min 1Q Median 3Q Max -9.905 -2.728 -1.568 4.212 7.153 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -30.4873 51.6259 -0.591 0.561 Width 3.2605 2.9281 1.114 0.277 Thick 7.8492 7.8883 0.995 0.330 Width:Thick -0.3135 0.4354 -0.720 0.479 Residual standard error: 4.699 on 23 degrees of freedom Multiple R-squared: 0.5519, Adjusted R-squared: 0.4935 F-statistic: 9.444 on 3 and 23 DF, p-value: 0.000296

  14. > anova(LinearModel.4, LinearModel.5) Analysis of Variance Table Model 1: Length ~ Width + Thick Model 2: Length ~ Width * Thick Res.Df RSS Df Sum of Sq F Pr(>F) 1 24 519.33 2 23 507.88 1 11.447 0.5184 0.4788

More Related