BCOR 1020 Business Statistics

BCOR 1020Business Statistics Lecture 25 – April 22, 2008

Overview Chapter 12 – Linear Regression • Ordinary Least Squares Formulas • Tests for Significance • Analysis of Variance: Overall Fit • Confidence and Prediction Intervals for Y • Example(s)

Chapter 12 – Ordinary Least Squares Formulas Slope and Intercept: • The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small. • Recall that the residuals are the differences between observed y-values and the fitted y-values on the line… • The sum of the residuals = 0 for any line… • So, we consider the sum of the squared residuals (the SSE)…

To find our OLS estimators, we need to find the values of b0 and b1 that minimize the SSE… The OLS estimator for the slope is: The OLS estimator for the intercept is: Chapter 12 – Ordinary Least Squares Formulas Slope and Intercept: or These are computed by the regression function on your computer or calculator.

Chapter 12 – Ordinary Least Squares Formulas Example (Regression Output): • We will consider the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). • Using MegaStat we can generate a regression output (in handout)… • Demonstration in Excel…

Chapter 12 – Ordinary Least Squares Formulas Example (Regression Output):

We want to explain the total variation in Y around its mean (SST for Total Sums of Squares) The regression sum of squares (SSR) is the explained variation in Y Chapter 12 – Ordinary Least Squares Formulas Assessing Fit:

The error sum of squares (SSE) is the unexplained variation in Y If the fit is good, SSE will be relatively small compared to SST. A perfect fit is indicated by an SSE = 0. The magnitude of SSE depends on n and on the units of measurement. Chapter 12 – Ordinary Least Squares Formulas Assessing Fit:

Chapter 12 – Ordinary Least Squares Formulas Coefficient of Determination: • R2 is a measure of relative fit based on a comparison of SSR and SST. 0 <R2< 1 • Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit. • In a bivariate regression, R2 = (r)2

Clickers Suppose you are have found the regression model for a given set of bivariate data. If the correlation is r = -0.72, what is the coefficient of determination? (A) -0.5184 (B) 0.5184 (C) 0.7200 (D) 0.8485 (E) -0.8485

The standard error (syx) is an overall measure of model fit. Chapter 12 – Test for Significance Standard Error of Regression: • If the fitted model’s predictions are perfect (SSE = 0), then syx = 0. Thus, a small syx indicates a better fit. • Used to construct confidence intervals. • Magnitude of syx depends on the units of measurement of Y and on data magnitude.

Standard error of the slope: Chapter 12 – Test for Significance Confidence Intervals for Slope and Intercept: • Standard error of the intercept: • Confidence interval for the true slope: • Confidence interval for the true intercept:

If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error. The hypotheses to be tested are: These are tested in the standard regression output in any statistics package like MegaStat. Chapter 12 – Test for Significance Hypothesis Tests:

A t test is used with n = n – 2 degrees of freedomThe test statistics for the slope and intercept are: Chapter 12 – Test for Significance Hypothesis Tests: • tn-2 is obtained from Appendix D or Excel for a given a. • Reject H0 if t > ta or if p-value <a. • The p-value is provided in the regression output.

Example (Regression Output): Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). Go through tests for significance on b0 and b1. Chapter 12 – Test for Significance

To explain the variation in the dependent variable around its mean, use the formula SST (total variation around the mean) SSR (variation explained by the regression) SSE (unexplained or error variation) + = Chapter 12 – Analysis of Variance Decomposition of Variance: • This same decomposition for the sums of squares is • The decomposition of variance is written as

For a bivariate regression, the F statistic is Chapter 12 – Analysis of Variance F Statistic for Overall Fit: • For a given sample size, a larger F statistic indicates a better fit. • Reject H0 if F > F1,n-2 from Appendix F for a given significance level a or if p-value <a.

Chapter 12 – Analysis of Variance Example (Regression Output): • Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). • Go through the Analysis of Variance (ANOVA) to assess overall fit.

Chapter 12 – Example Example (Exam Scores): • We will consider the dataset “ExamScores” from your text (Table 12.3 on p.434) which considers the relationship between Study Hours (X) and Exam Scores (Y). • Generate MegaStat regression output. • Output on Overhead…

Clickers If a randomly selected student had studied 12 hours for this exam, what score would this model Predict (to the nearest %)? (A) 51% (B) 61% (C) 73% (D) 82%

Clickers Find the p-value on the hypothesis test… (A) 0.0012 (B) 0.0520 (C) 0.3940 (D) 1.9641

Clickers Recall from Tuesday’s lecture, the critical value for testing whether the correlation is significant is given by Compute the critical value and determine whether the correlation is significant using a = 10%. (A) Yes, r is significant. (B) No, r is not significant.

Clickers – Work… Work… Since n = 10 and a = 10%, ta/2,n-2 = t.05,8 = 1.860. From the output, r = 0.628. Since |r| > ra, we can reject H0: r = 0 in favor of H1: r 0. Or, using … Since |T*| > ta/2,n-2 = t.05,8 = 1.860, we reach the same conclusion. The correlation is significant.

Chapter 12 – Confidence & Prediction Intervals for Y How to Construct an Interval Estimate for Y • The regression line is an estimate of the conditional mean of Y. • An interval estimate is used to show a range of likely values of the point estimate. • Confidence Interval for the conditional mean of Y

Chapter 12 – Confidence & Prediction Intervals for Y How to Construct an Interval Estimate for Y • Prediction interval for individual values of Y is • Prediction intervals are wider than confidence intervals because individualY values vary more than the mean of Y.

Chapter 12 – Confidence & Prediction Intervals for Y MegaStat’s Confidence and Prediction Intervals:

BCOR 1020 Business Statistics