320 likes | 435 Vues
Chapter 15. Multiple Regression. Regression. Multiple Regression Model y = b 0 + b 1 x 1 + b 2 x 2 + … + b p x p + e. Multiple Regression Equation y = b 0 + b 1 x 1 + b 2 x 2 + … + b p x p. Estimated Multiple Regression Equation. Car Data.
E N D
Chapter 15 Multiple Regression
Regression Multiple Regression Model y = b0 + b1x1 + b2x2 + … + bpxp + e Multiple Regression Equation y = b0 + b1x1 + b2x2 + … + bpxp Estimated Multiple Regression Equation
Car Data Continuing on for 397 observations
Multiple Regression, Example Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders: -14.4 -.00652(4000)+.76(80)-.0741(6) =-14.4-26.08+60.8-.4446=19.88
Sums of Squares SST = SSR + SSE
Multiple Coefficient of Determination The share of the variation explained by the estimated model. R2 = SSR/SST Multiple Correlation Coefficient The correlation coefficient of the actual and predicted values
F Test for Overall Significance H0: b1 = b2 = . . . = bp = 0 Ha: One or more of the parameters is not equal to zero Reject H0 if: F >Fa Or Reject H0 if: p-value <a F = MSR/MSE
t Test for Coefficients H0: b1 = 0 Ha: b1 ≠ 0 Reject H0 if: t < -ta/2 or t >ta/2 Or if: p <a t = b1/sb1 With a t distribution of n-p-1 df
Multicollinearity When two or more independent variables are highly correlated. When multicollinearity is severe the estimated values of coefficients will be unreliable.
Multicollinearity • Two guidelines for identifying multicollinearity: • If the absolute value of the correlation coefficient for two independent variables exceeds 0.7 • If the correlation coefficient for an independent variable and some other independent variable is greater than the correlation with that variable and the dependent variable
Multicollinearity Table of correlation coefficients:
Multicollinearity R Square 0.708
Qualitative Variables and Regression Quantitative variable – A variable that can be measured numerically (interval or ratio scale of measurement) Qualitative variable – A variable where labels or names are used to identify some attribute (nominal or ordinal scale of measurement)
Qualitative Variables and Regression The effect of a quantitative variable can be estimated using a dummy variable. A dummy variable can equal 0 or 1, it creates different y intercepts for groups with different attributes.
Qualitative Variables and Regression Assume we estimate a regression model for the number of sick days an employee takes per year. A dummy variable is included that equals 1 if the individual smokes and 0 if they do not. Age is also included in the model.
Qualitative Variables and Regression Example of how data would be coded: Estimated model: Sick days taken = -1 +(3)Smoker + (.1)Age
Dummy Variables Sick days taken = -1 +(3)Smoker + (.1)Age What is the y-intercept for nonsmokers? -1 What is the y-intercept for smokers? 2 What is the predicted number of sick days for a 40-year-old smoker? 6 What is the average difference in the number of sick days taken by smokers and nonsmokers? 3
Dummy Variables If an attribute has three or more possible values you must include k-1 dummy variables in the model, where k is the number of possible values.
Dummy Variables Suppose we have three job classifications: manager, operator, and secretary Operator dummy equals 1 if the person is an operator, 0 otherwise Secretary dummy equals 1 if the person is an secretary, 0 otherwise Manager is the omitted group (choice of omitted group will not alter the predicted values)
Dummy Variables Sick days taken = -1 +(1)Operator + 1.5(Secretary) + (.1)Age What are the y-intercepts for each job classification? Managers=-1, Operators=0, Secretaries=0.5 What is the predicted number of sick days for a 40-year-old secretary? 4.5 What is the average difference in the number of sick days taken by operators and secretaries? 0.5
Dummy Variables In some cases there will be multiple sets of dummy variables, such as: Sick days taken = -1 +(3)Smoker + (1)Operator + 1.5(Secretary) + (.1)Age Note that there are now 6 different intercepts: Nonsmoker, Manager: -1 (omitted group) Smoker, Manager: 2 Nonsmoker, Operator: 0 Smoker, Operator: 3 Nonsmoker, Secretary: 0.5 Smoker, Secretary: 3.5
Dummy Variables Note that when dummy variables are used we are assuming that the coefficients of the other variables are the same for all groups. In this example the increase in sick days used from aging a year is equal to 0.1 for all of the groups. If there is reason to believe the effect of an independent variable differs by group, you may want to estimate separate equations for each group.
Nonlinear Relationships Nonlinear relationships can be modeled by including a variable that is a nonlinear function of an independent variable. For example it is usually assumed that health care expenditures increase at an increasing rate as people age.
Nonlinear Relationships In that case you might try including age squared into the model: Health expend = 500 + (5)Age + (.5)AgeSQ Age Health Expend 600 20 800 30 1100 40 1500
Nonlinear Relationships If the dependent variable increases at a decreasing rate as the independent variable rises you might want to include the square root of the independent variable. If you are unsure of the nature of the relationship you can use dummy variables for different ranges of values of the independent variable.
Non-continuous Relationships If the relationship between the dependent variable and an independent variable is non-continuous a slope dummy variable can be used to estimate two sets of coefficients for the independent variable. For example, if natural gas usage is not affected by temperature when the temperature rises above 60 degrees, we could have: Gas usage = b0 + b1(GT60) + b2(Temp) + b2(GT60)(Temp)
Non-continuous Relationships Note that at temperatures above 60 degrees the net effect of a 1 degree increase in temperature on gas usage is -0.056 (-.866+.810)