130 likes | 266 Vues
732G21/732G28/732A35. Lecture 5. Extra sums of squares. The difference between SSE for a model with a certain setup of predictors and the SSE for a model with the same predictors plus one or more additional predictors Consider the model
 
                
                E N D
732G21/732G28/732A35 Lecture 5
Extra sums of squares The difference between SSE for a model with a certain setup of predictors and the SSE for a model with the same predictors plus one or more additional predictors Consider the model Then, we can define the extra sums of squares from adding X2 to the model as
Salary example Regression Analysis: Salary (Y) versus Age (X1) The regression equation is Salary (Y) = 8.45 + 0.547 Age (X1) Predictor Coef SE Coef T P Constant 8.454 4.848 1.74 0.132 Age (X1) 0.5471 0.1099 4.98 0.003 S = 4.05592 R-Sq = 80.5% R-Sq(adj) = 77.2% Analysis of Variance Source DF SS MS F P Regression 1 407.30 407.30 24.76 0.003 Residual Error 6 98.70 16.45 Total 7 506.00
Regression Analysis: Salary (Y) versus Age (X1), Highschool points (X2) The regression equation is Salary (Y) = 10.1 + 0.319 Age (X1) + 0.0805 Highschool points (X2) Predictor Coef SE Coef T P Constant 10.126 2.347 4.32 0.008 Age (X1) 0.31869 0.07225 4.41 0.007 Highschool points (X2) 0.08049 0.01746 4.61 0.006 S = 1.93941 R-Sq = 96.3% R-Sq(adj) = 94.8% Analysis of Variance Source DF SS MS F P Regression 2 487.19 243.60 64.76 0.000 Residual Error 5 18.81 3.76 Total 7 506.00 Source DF Seq SS Age (X1) 1 407.30 Highschool points (X2) 1 79.90
Partial F-test H0: βq = βq+1 = … = βp-1 = 0 Ha: not all β in H0 = 0 Reject H0 if F* > F(1-α; p-q; n-p)
Regression Analysis: Salary (Y) versus Age (X1), Highschool point, ... The regression equation is Salary (Y) = 7.13 + 0.393 Age (X1) + 0.0652 Highschool points (X2) + 2.73 Female/Male (X3) Predictor Coef SE Coef T P Constant 7.132 2.155 3.31 0.030 Age (X1) 0.39317 0.06201 6.34 0.003 Highschool points (X2) 0.06521 0.01441 4.52 0.011 Female/Male (X3) 2.732 1.185 2.31 0.082 S = 1.42101 R-Sq = 98.4% R-Sq(adj) = 97.2% Analysis of Variance Source DF SS MS F P Regression 3 497.92 165.97 82.20 0.000 Residual Error 4 8.08 2.02 Total 7 506.00 Source DF Seq SS Age (X1) 1 407.30 Highschool points (X2) 1 79.90 Female/Male (X3) 1 10.73
Summary of tests of regression coefficients Test whether a single βk = 0: t-test Test whether all β = 0: F-test Test whether a subset of the β = 0: Partial F-test
Coefficient of partial determination Tell us how much R2 increases if another predictor is added to the model. Consider and add X2.
Multicollinearity Whenwehave high correlationamong the predictors. Multicollinearitycauses Adding or deleting a predictor changes the estimates of the regression coefficients very much. The standard errors of the regression coefficients become very large. Thus, conclusions from the model become more imprecise. The estimated regression coefficients will be nonsignificant, although they are highly correlated with Y. When we interpret the regression coefficients, we interpret one at the time, keeping the others constant. If there is high correlation among the predictors, this is of course not possible because if we change one of them, the others will change too (it is possible mathematically, but not logically).
Indications of the presence of multicollinearity Large changes in the regression coefficients when a predictor is added or deleted Non-significant results in t-tests on the regression coefficients for variables that through scatter matrix and correlation matrices (and logically) seemed to be very important. Estimated regression coefficients with a sign opposite to what we expect it to be.
Formal test of the presence of multicollinearity give give give Variance Inflation Factor (VIF) is the coefficient of determination when performing a regression of Xk versus the other X-variables in the model. Consider a model with predictors X1, X2 and X3: Decision rule: if the largest VIF > 10 and the average of the VIF:s are larger than one, we may multicollinearity in the model.