430 likes | 840 Vues
BA 201. Lecture 14 Multiple Regression Model. Topics. Developing the Multiple Linear Regression Inferences on Population Regression Coefficients Pitfalls in Multiple Regression and Ethical Issues. The Multiple Regression Model.
E N D
BA 201 Lecture 14 Multiple Regression Model
Topics • Developing the Multiple Linear Regression • Inferences on Population Regression Coefficients • Pitfalls in Multiple Regression and Ethical Issues
The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random Error Residual Dependent (Response) variable for sample Independent (Explanatory) variables for sample model
Simple Linear Regression Model Revisited Y X Observed Value
Population Multiple Regression Model Bivariate model(2 Independent Variables: X1 and X2)
Sample Multiple Regression Model Bivariate model Sample Regression Plane
Multiple Linear Regression Equation Too complicated by hand! Ouch!
Multiple Regression Model: Example (0F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
Multiple Regression in PHStat • PHStat | Regression | Multiple Regression … • EXCEL spreadsheet for the heating oil example.
Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
Interpretation of Estimated Coefficients • Slope (bi) • Estimated that the average value of Y changes by bi for each 1 unit increase in Xi holding all other variables constant (ceterus paribus) • Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1) given the inches of insulation (X2) • Y-Intercept (b0) • The estimated average value of Y when all Xi = 0
Simple and Multiple Regression Compared • Coefficients in a simple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable but are excluded from the model. • Coefficients in a multiple regression net out the impacts of other variables in the equation. • Hence they are called the netregression coefficients. • They still pick up the effects of other variables that excluded form the model but are correlated with the included variables and the dependent variable.
Simple and Multiple Regression Compared:Example • Two simple regressions: • Multiple Regression:
Venn Diagrams and Explanatory Power of a Simple Regression Variations in Oil explained by the error term Variations in Temp not used in explaining variation in Oil Oil Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil Temp
Venn Diagrams and Explanatory Power of a Simple Regression (continued) Oil Temp
Venn Diagrams and Explanatory Power of a Multiple Regression Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor Variation NOT explained by Temp nor Insulation Oil Temp Insulation
Coefficient of Multiple Determination • Proportion of Total Variation in Y Explained by All X Variables Taken Together • Never Decreases When a New X Variable is Added to Model • Disadvantage When Comparing Models
Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation
Adjusted Coefficient of Multiple Determination • Proportion of Variation in Y Explained by All X Variables adjusted for the Number of X Variables Used and the Sample Size • Penalize Excessive Use of Independent Variables • Smaller than • Useful in Comparing among Models • Could Decrease If an Insignificant New X Variable Is Added to the Model
Coefficient of Multiple Determination Excel Output • Adjusted r2 • reflects the number of explanatory variables and sample size • is smaller than r2
Interpretation of Coefficient of Multiple Determination • 96.56% of the total variation in heating oil can be explained by different temperature and the variation in the amount of insulation • 95.99% of the total fluctuation in heating oil can be explained by different temperature and the variation in the amount of insulation after adjusting for the number of explanatory variables and sample size
Example: Adjusted r2Can Decrease Adjusted r 2 decreases when k increases from 2 to 3
Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches. The predicted heating oil used is 278.97 gallons
Predictions in PHStat • PHStat | Regression | Multiple Regression … • Check the “Confidence and Prediction Interval Estimate” box • EXCEL spreadsheet for the heating oil example.
Another Example • The Excel spreadsheet that contains the multiple regression result of regressing Mid-term scores on quiz scores and attendance score
Residual Plots • Residuals Vs • May need to transform Y variable • Residuals Vs • May need to transform variable • Residuals Vs • May need to transform variable • Residuals Vs Time • May have autocorrelation
Residual Plots: Example Maybe some non-linear relationship No Discernable Pattern
Testing for Overall Significance • Shows if there is a Linear Relationship between all of the X Variables Together and Y • Shows if Y Depends Linearly on all of the X Variables Together as a Group • Use F Test Statistic • Hypotheses: • H0: 1 = 2 = … = k = 0 (No linear relationship) • H1: At least one i 0 ( At least one independent variable affects Y ) • The Null Hypothesis is a Very Strong Statement • Almost Always Reject the Null Hypothesis
Testing for Overall Significance (continued) • Test Statistic: • where F has k numerator and (n-k-1) denominator degrees of freedom
Test for Overall SignificanceExcel Output: Heating Oil Example p value k = 2, the number of explanatory variables n - 1
H0: 1 = 2 = … = k = 0 H1: At least one i 0 = .05 df = 2 and 12 Critical Value(s): Test for Overall SignificanceExample Solution Test Statistic: Decision: Conclusion: F 168.47 (Excel Output) Reject at = 0.05 There is evidence that at least one independent variable affects Y = 0.05 F 0 3.89
Test for Significance:Individual Variables • Shows if There is a Linear Relationship Between the Variable Xi and Y while Holding the Effects of other X’s Fixed • Show if Y Depends Linearly on a Single Xi Individually while Holding the Effects of other X’s Fixed • Use t Test Statistic • Hypotheses: • H0: i= 0 (No linear relationship) • H1: i 0 (Linear relationship between Xi and Y)
t Test StatisticExcel Output: Example t Test Statistic for X1 (Temperature) t Test Statistic for X2 (Insulation)
t Test : Example Solution Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. H0: 1 = 0 H1: 1 0 df = 12 Critical Value(s): Test Statistic: Decision: Conclusion: t Test Statistic = -16.1699 Reject H0 at = 0.05 Reject H Reject H 0 0 There is evidence of a significant effect of temperature on oil consumption. .025 .025 b1 0 t 2.1788 -2.1788 0
Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1(the effect of temperature on oil consumption). -6.169 1 -4.704 The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F.
Additional Pitfalls and Ethical Issues • Fail to Understand that Interpretation of the Estimated Regression Coefficients are Performed Holding All Other Independent Variables Constant • Fail to Evaluate Residual Plots for Each Independent Variable
Summary • Developed the Multiple Regression Model • Addressed Testing the Significance of the Multiple Regression Model • Discussed Inferences on Population Regression Coefficients • Addressed Pitfalls in Multiple Regression and Ethical Issues