Download
chapter 2 simple linear regression n.
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 2 Simple Linear Regression PowerPoint Presentation
Download Presentation
Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression

395 Vues Download Presentation
Télécharger la présentation

Chapter 2 Simple Linear Regression

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung

  2. 2.1 Simple Linear Regression Model • y = 0 + 1 x +  • x: regressor variable • y: response variable • 0: the intercept, unknown • 1: the slope, unknown • : error with E() = 0 and Var() = 2 (unknown) • The errors are uncorrelated.

  3. Given x, E(y|x) = E(0 + 1 x + ) = 0 + 1 x Var(y|x) = Var(0 + 1 x + ) = 2 • Responses are also uncorrelated. • Regression coefficients: 0, 1 • 1: the change of E(y|x) by a unit change in x • 0: E(y|x=0)

  4. 2.2 Least-squares Estimation of the Parameters 2.2.1 Estimation of 0 and 1 • n pairs: (yi, xi), i = 1, …, n • Method of least squares: Minimize

  5. Least-squares normal equations:

  6. The least-squares estimator:

  7. The fitted simple regression model: • A point estimate of the mean of y for a particular x • Residual: • An important role in investigating the adequacy of the fitted regression model and in detecting departures from the underlying assumption!

  8. Example 2.1: The Rocket Propellant Data • Shear strength is related to the age in weeks of the batch of sustainer propellant. • 20 observations • From scatter diagram, there is a strong relationship between shear strength (y) and propellant age (x). • Assumption y = 0 + 1 x + 

  9. The least-square fit:

  10. How well does this equation fit the data? • Is the model likely to be useful as a predictor? • Are any of the basic assumption violated and if so how serious is this?

  11. 2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • are linear combinations of yi • are unbiased estimators.

  12. The Gauss-Markov Theorem: are the best linear unbiased estimators (BLUE).

  13. Some useful properties: • The sum of the residuals in any regression model that contains an intercept 0 is always 0, i.e. • Regression line always passes through the centroid point of data,

  14. 2.2.3 Estimator of 2 • Residual sum of squares:

  15. Since , the unbiased estimator of 2 is • MSE is called the residual mean square. • This estimate is model-dependent. • Example 2.2

  16. 2.2.4 An Alternate Form of the Model • The new regression model: • Normal equations: • The least-squares estimators:

  17. Some advantages: • The normal equations are easier to solve • are uncorrelated.

  18. 2.3 Hypothesis Testing on the Slope and Intercept • Assume εi are normally distributed • yi ~ N(0 + 1 xi , 2 ) 2.3.1 Use of t-Tests • Test on slope: • H0: 1 = 10 v.s. H1: 110

  19. If 2 is known, under null hypothesis, • (n-2) MSE/2 follows a 2n-2 • If 2 is unknown, • Reject H0 if |t0| > t/2, n-2

  20. Test on intercept: • H0: 0 = 00 v.s. H1: 000 • If 2 is unknown • Reject H0 if |t0| > t/2, n-2

  21. 2.3.2 Testing Significance of Regression • H0: 1 = 0 v.s. H1: 10 • Accept H0: there is no linear relationship between x and y.

  22. Reject H0: x is of value in explaining the variability in y. • Reject H0 if |t0| > t/2, n-2

  23. Example 2.3:The Rocket Propellant Data • Test significance of regression • MSE = 9244.59 • the test statistic is • t0.0025,18 = 2.101 • Reject H0

  24. 2.3.3 The Analysis of Variance (ANOVA) • Use an analysis of variance approach to test significance of regression

  25. SST: the corrected sum of squares of the observations. It measures the total variability in the observations. • SSRes: the residual or error sum of squares • The residual variation left unexplained by the regression line. • SSR: the regression or model sum of squares • The amount of variability in the observations accounted for by the regression line • SST = SSR + SSRes

  26. The degree-of-freedom: • dfT = n-1 • dfR = 1 • dfRes = n-2 • dfT = dfR + dfRes • Test significance regression by ANOVA • SSRes = (n-2) MSRes ~ n-2 • SSR = MSR ~ 1 • SSR and SSRes are independent

  27. E(MSRes) = 2 • E(MSR) = 2 + 12 Sxx • Reject H0 if F0 > F/2,1, n-2 • If 1 0, F0 follows a noncentral F with 1 and n-2 degree of freedom and a noncentrality parameter

  28. Example 2.4: The Rocket Propellant Data

  29. More About the t Test • The square of a t random variable with f degree of freedom is a F random variable with 1 and f degree of freedom.

  30. 2.4 Interval Estimation in Simple Linear Regression 2.4.1 Confidence Intervals on 0, 1 and 2 • Assume that εi are normally and independently distributed

  31. 100(1-)% confidence intervals on 0, 1are given: • Interpretation of C.I. • Confidence interval for 2:

  32. Example 2.5The Rocket Propellant Data

  33. 2.4.2 Interval Estimation of the Mean Response • Let x0 be the level of the regressor variable for which we wish to estimate the mean response. • x0 is in the range of the original data on x. • An unbiased estimator of E(y| x0) is

  34. follows a normal distribution.

  35. A 100(1-)% confidence interval on the mean response at x0:

  36. Example 2.6 The Rocket Propellant Data

  37. The interval width is a minimum for and widens as increases. • Extrapolation

  38. 2.5 Prediction of New Observations • is the point estimate of the new value of the response • follows a normal distribution with mean 0 and variance

  39. The 100(1-)% confidence interval on a future observation at x0 (a prediction interval for the future observation y0)

  40. Example 2.7:

  41. The 100(1-)% confidence interval on

  42. 2.6 Coefficient of Determination • The coefficient of determination: • The proportion of variation explained by the regressor x • 0  R2 1

  43. In Example 2.1, R2 = 0.9018. It means that 90.18% of the variability in strength is accounted for by the regression model. • R2 can be increased by adding terms to the model. • For a simple regression model, • E(R2) increases (decreases) as Sxx increases (decreases)

  44. R2 does not measure the magnitude of the slope of the regression line. A large value of R2 imply a steep slope. • R2 does not measure the appropriateness of the linear model.