Review Session

Review Session Linear Regression

Correlation • Pearson’s r • Measures the strength and type of a relationship between the x and y variables • Ranges from -1 to +1

Correlation printout in Minitab • Top number is thecorrelation • Bottom number is thep-value

Simple Linear Regression y=b0 + b1x1 + e

Simple Linear RegressionMaking A Point Prediction y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) For a person with a GMAT Score of 400, what is the expected 1st year GPA? GPA = 1.47 + 0.00323(GMAT) GPA = 1.47 + 0.00323(400) GPA = 1.47 + 1.292 GPA = 2.76

Simple Linear Regression y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) What’s the 95% CI for the GPA of a person with a GMAT score of 400? GPA = 2.76 SE = 0.26 2.76 +/- 2(0.26) 95% CI = (2.24, 3.28)

Coefficient CI’s and Testing y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) Find the 95% CI for the coefficients. b0 = 1.47 +/- 2(0.22) = 1.47 +/- 0.44 = (1.03, 1.91) b1 = 0.0032 +/- 2(0.0004) = 0.0032 +/- 0.0008 = 0.0026, 0.0040

Coefficient Testing y = b0 + b1x1 + e GPA = 1.47 + 0.00323(GMAT) The p-value for each coefficient is the result of a hypothesis test H0: b = 0 H1: b <> 0 If p-value <= 0.05, reject H0 and accept the coefficient.

R2 • r2 and R2 • Square of Pearson's r • Little r2 is for simple regression • Big R2 is used for multiple regression

Sample R2 values R2 = 0.80 R2 = 0.60 R2 = 0.30 R2 = 0.20

Regression ANOVA • H0: b1 = b2 = …. = bk = 0 • Ha: at least one b <> 0 • F-statistic, df1, df2  p-value • If p <= 0.05, at least one of the b’s is not zero • If p > 0.05, it’s possible that all of the b’s are zero

Diagnostics - Residuals • Residuals = errors • Residuals should be normally distributed • Residuals should have a constant variance • Heteroscedasticity: pattern in the residual distribution • Autocorrelation: error magnitude increases or decreases with the magnitude of an independent variable • Heteroscedasticity and autocorrelation indicate problems with the model • Homoscedasticity: no pattern in the residual distribution • Use the 4-in-one plot for these diagnostics

Adding a Power Transformation • Each “bump” or “U” shape in a scatter plot indicates that an additional power may be involved. • 0 bumps: x • 1 bump: x2 • 2 bumps: x3 • Standard equation is y = b0 + b1x+ b2x2 • Don’t forget: Check to see if b1 and b2 are statistically significant, and that the model is also statistically significant.

Categorical Variables • Occasionally it is necessary to add a categorical variable to a regression model. • Suppose that we have a car dealership, and we want to model the sale price based on the time on the lot and the sales person (Tom, Dick, or Harry). • The time on the lot is a linear variable. • Salesperson is a categorical variable.

Categorical Variables • Categorical variables are modeled in regression using Boolean logic Example: y = b0 + btimextime + bTomxTom + bDickxDick

Categorical Variables Harry is the baseline category for the model Tom and Dick’s performance will be gauged in relation to Harry, but not each other. Example: y = b0 + btimextime + bTomxTom + bDickxDick

Categorical Variables y = b0 + btimextime + bTomxTom + bDickxDick • Interpretation • Tom’s average sale price is bTom more than Harry’s • Dick’s average sale price is bDick more than Harry’s

Multicolinearity • Multicolinearity: Predictor variables are correlated with each other. • Multicolinearity results in instability in the estimation of the b’s • P-values will be larger • Confidence in the b’s decreases or disappears (magnitude and sign may be different from the expected values) • A small change in the data results in large variations in the coefficients • Read 11.11

VIF-Variance Inflation Factor • Measures the degree to which the confidence in the estimate of the coefficient is decreased by multicolinearity. • The larger the VIF, the greater a problem multicolinearity is. • If VIF > 10 then there may be a problem • If VIF >=15 then there may be a serious problem

Model Selection • Start with everything. • Delete variables with high VIF factors one at a time. • Delete variables one at a time, deleting the one with the largest p-value. • Stop when all p-values are less than 0.05.

Demand Price Curve The demand-price function is nonlinear: D=b0Pb1 A log transformation makes it linear: ln(D)=ln(b0) +b1ln(P) Run the Regression on the transformed variables Plug the coefficients into the equation below: D=eb0Pb1 Make your projections on this last equation.

Demand Price Curve • Create a variable for the natural log of demand and the natural log of the independent variables. • In Excel : =ln(demand), =ln(price), =ln(income), etc. • Run the regression on the transformed variables. • Place the coefficients in the equation: d=econstantpb1ib2 • Simplify to: d=kpb1ib2 (Note that econstant=k) • If income is not included, then the equation is just: d=kpb1 The demand-price function is nonlinear: d=kpb1 A log transformation makes it linear: ln(d)=b0 +bpln(p)

Review Session

Review Session

Presentation Transcript

Review Session

PLT Review Session

Anatomy Review Session

REVIEW SESSION

Review Session

Review Session

CRCT REVIEW SESSION

Review Session

Review Session

Review Session

Review Session

REVIEW SESSION

Review Session

Review session

Review Session

Review Session

Midterm Review Session