The Simple Linear Regression Model

The Simple Linear Regression Model • Simple Linear Regression Model y = 0 + 1x+  • Simple Linear Regression Equation E(y) = 0 + 1x • Estimated Simple Linear Regression Equation y = b0 + b1x ^

最小平方直線（最佳預測直線） • 通過平面分佈圖資料點的直線中，使預測誤差平方和爲最小者即稱爲最小平方直線，而此方法即稱爲最小平方法（Least Square Method） • 何謂誤差平方和？設爲n個資料點，若以做爲以X預測Y的直線，則當X＝x1，預測值與實際觀察的y1之差異即稱爲預測誤差，誤差平方和即定義爲求使函數 f 爲最小時，由微積分解“極大或極小”方法。

最小平方直線 解此聯立方程組可得：故最小平方直線為

Example: Reed Auto Sales • Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 6 previous sales are shown below. Number of TV AdsNumber of Cars Sold 1 14 3 24 2 18 1 17 3 27 2 22

Example: Reed Auto Sales • Slope for the Estimated Regression Equation b1 = 264 - (12)(122)/5 = 5 28 - (12)2/5 • y-Intercept for the Estimated Regression Equation b0 = 20.333 - 5(2) = 10.333 • Estimated Regression Equation y = 10.333 + 5x ^

Example: Reed Auto Sales • Scatter Diagram

^ ^ The Coefficient of Determination • Relationship Among SST, SSR, SSE SST = SSR + SSE • Coefficient of Determination r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

判定係數 • 定義： r2 = SSR/SST • 用以表示Y的變異數中已被X解釋的部分（比率） • 當r2 愈大時，表示最小平方直線愈精確 • 1－ r2為總變異數(SST)中無法由X解釋的餘量（剩餘的比率） • 表示汽車銷售量的差異與變化有85.2%可由“廣告次數”這個因素來解釋（而有14.8%無法由“廣告次數”所解釋） Example: Reed Auto Sales r2 = SSR/SST = 100/117.333 = .852273

The Correlation Coefficient • Sample Correlation Coefficient where: b1 = the slope of the estimated regression equation

Example: Reed Auto Sales • Sample Correlation Coefficient The sign of b1 in the equation is “+”. rxy = +.923186

Model Assumptions • Assumptions About the Error Term  • The error  is a random variable with mean of zero. • The variance of  , denoted by  2, is the same for all values of the independent variable. • The values of  are independent. • The error  is a normally distributed random variable.

Testing for Significance • To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero. • Two tests are commonly used • t Test • F Test • Both tests require an estimate of s2, the variance of e in the regression model.

Testing for Significance • An Estimate of s2 The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where:

Testing for Significance • An Estimate of s • To estimate s we take the square root of s 2. • The resulting s is called the standard error of the estimate.

Testing for Significance: t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic • Rejection Rule Reject H0 if t < -tor t > t where tis based on a t distribution with n - 2 degrees of freedom.

Example: Reed Auto Sales • t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Rejection Rule For  = .05 and d.f. = 4, t.025 = 2.776 Reject H0 if t > 2.776 • Test Statistics t = 5/1.0408 = 4.804 • Conclusions Reject H0 • P-value 2P{T>4.804}=0.0086 <0.05 Reject H0

Confidence Interval for 1 • We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. • H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.

Confidence Interval for 1 • The form of a confidence interval for 1 is: where b1 is the point estimate is the margin of error is the t value providing an area of a/2 in the upper tail of a t distribution with n - 2 degrees of freedom

Example: Reed Auto Sales • Rejection Rule Reject H0 if 0 is not included in the confidence interval for 1. • 95% Confidence Interval for 1 = 5 2.776(1.0408) = 5 2.89 or 2.11 to 7.89 • Conclusion Reject H0

Testing for Significance: F Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic F = MSR/MSE • Rejection Rule Reject H0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator.

Example: Reed Auto Sales • F Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Rejection Rule • For  = .05 and d.f. = 1, 4: F.05 = 7.709 • Reject H0 if F > 7.709. • Test Statistic • F = MSR/MSE = 100/4.333 = 23.077 • Conclusion • We can reject H0.

Some Cautions about theInterpretation of Significance Tests • Rejecting H0: b1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. • Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.

Using the Estimated Regression Equationfor Estimation and Prediction • Confidence Interval Estimate of E(yp) • Prediction Interval Estimate of yp yp+t/2 sind where the confidence coefficient is 1 -  and t/2 is based on a t distribution with n - 2 d.f. • is the standard error of the estimate of E(yp) sind is the standard error of individual estimate of

Standard Errors of Estimate of E(yp) and yp

E(yp) 與yp估計式的變異數 • 的變異數： • 的變異數： • e的變異數： • 估計式的變異數： • 估計式的變異數：

Example: Reed Auto Sales • Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y = 10.333 + 5(3) = 25.333 cars • Confidence Interval for E(yp) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25.333 + 3.730 = 21.603 to 29.063 cars • Prediction Interval for yp 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25.333 + 6.878 = 18.455 to 32.211 cars ^

The Simple Linear Regression Model