Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung

2.1 Simple Linear Regression Model • y = 0 + 1 x +  • x: regressor variable • y: response variable • 0: the intercept, unknown • 1: the slope, unknown • : error with E() = 0 and Var() = 2 (unknown) • The errors are uncorrelated.

Given x, E(y|x) = E(0 + 1 x + ) = 0 + 1 x Var(y|x) = Var(0 + 1 x + ) = 2 • Responses are also uncorrelated. • Regression coefficients: 0, 1 • 1: the change of E(y|x) by a unit change in x • 0: E(y|x=0)

2.2 Least-squares Estimation of the Parameters 2.2.1 Estimation of 0 and 1 • n pairs: (yi, xi), i = 1, …, n • Method of least squares: Minimize

Least-squares normal equations:

The least-squares estimator:

The fitted simple regression model: • A point estimate of the mean of y for a particular x • Residual: • An important role in investigating the adequacy of the fitted regression model and in detecting departures from the underlying assumption!

Example 2.1: The Rocket Propellant Data • Shear strength is related to the age in weeks of the batch of sustainer propellant. • 20 observations • From scatter diagram, there is a strong relationship between shear strength (y) and propellant age (x). • Assumption y = 0 + 1 x + 

The least-square fit:

How well does this equation fit the data? • Is the model likely to be useful as a predictor? • Are any of the basic assumption violated and if so how serious is this?

2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • are linear combinations of yi • are unbiased estimators.

The Gauss-Markov Theorem: are the best linear unbiased estimators (BLUE).

Some useful properties: • The sum of the residuals in any regression model that contains an intercept 0 is always 0, i.e. • Regression line always passes through the centroid point of data,

2.2.3 Estimator of 2 • Residual sum of squares:

Since , the unbiased estimator of 2 is • MSE is called the residual mean square. • This estimate is model-dependent. • Example 2.2

2.2.4 An Alternate Form of the Model • The new regression model: • Normal equations: • The least-squares estimators:

Some advantages: • The normal equations are easier to solve • are uncorrelated.

2.3 Hypothesis Testing on the Slope and Intercept • Assume εi are normally distributed • yi ~ N(0 + 1 xi , 2 ) 2.3.1 Use of t-Tests • Test on slope: • H0: 1 = 10 v.s. H1: 110

If 2 is known, under null hypothesis, • (n-2) MSE/2 follows a 2n-2 • If 2 is unknown, • Reject H0 if |t0| > t/2, n-2

Test on intercept: • H0: 0 = 00 v.s. H1: 000 • If 2 is unknown • Reject H0 if |t0| > t/2, n-2

2.3.2 Testing Significance of Regression • H0: 1 = 0 v.s. H1: 10 • Accept H0: there is no linear relationship between x and y.

Reject H0: x is of value in explaining the variability in y. • Reject H0 if |t0| > t/2, n-2

Example 2.3:The Rocket Propellant Data • Test significance of regression • MSE = 9244.59 • the test statistic is • t0.0025,18 = 2.101 • Reject H0

2.3.3 The Analysis of Variance (ANOVA) • Use an analysis of variance approach to test significance of regression

SST: the corrected sum of squares of the observations. It measures the total variability in the observations. • SSRes: the residual or error sum of squares • The residual variation left unexplained by the regression line. • SSR: the regression or model sum of squares • The amount of variability in the observations accounted for by the regression line • SST = SSR + SSRes

The degree-of-freedom: • dfT = n-1 • dfR = 1 • dfRes = n-2 • dfT = dfR + dfRes • Test significance regression by ANOVA • SSRes = (n-2) MSRes ~ n-2 • SSR = MSR ~ 1 • SSR and SSRes are independent

E(MSRes) = 2 • E(MSR) = 2 + 12 Sxx • Reject H0 if F0 > F/2,1, n-2 • If 1 0, F0 follows a noncentral F with 1 and n-2 degree of freedom and a noncentrality parameter

Example 2.4: The Rocket Propellant Data

More About the t Test • The square of a t random variable with f degree of freedom is a F random variable with 1 and f degree of freedom.

2.4 Interval Estimation in Simple Linear Regression 2.4.1 Confidence Intervals on 0, 1 and 2 • Assume that εi are normally and independently distributed

100(1-)% confidence intervals on 0, 1are given: • Interpretation of C.I. • Confidence interval for 2:

Example 2.5The Rocket Propellant Data

2.4.2 Interval Estimation of the Mean Response • Let x0 be the level of the regressor variable for which we wish to estimate the mean response. • x0 is in the range of the original data on x. • An unbiased estimator of E(y| x0) is

follows a normal distribution.

A 100(1-)% confidence interval on the mean response at x0:

Example 2.6 The Rocket Propellant Data

The interval width is a minimum for and widens as increases. • Extrapolation

2.5 Prediction of New Observations • is the point estimate of the new value of the response • follows a normal distribution with mean 0 and variance

The 100(1-)% confidence interval on a future observation at x0 (a prediction interval for the future observation y0)

Example 2.7:

The 100(1-)% confidence interval on

2.6 Coefficient of Determination • The coefficient of determination: • The proportion of variation explained by the regressor x • 0  R2 1

In Example 2.1, R2 = 0.9018. It means that 90.18% of the variability in strength is accounted for by the regression model. • R2 can be increased by adding terms to the model. • For a simple regression model, • E(R2) increases (decreases) as Sxx increases (decreases)

R2 does not measure the magnitude of the slope of the regression line. A large value of R2 imply a steep slope. • R2 does not measure the appropriateness of the linear model.

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression

Presentation Transcript

Chapter 12 Simple Linear Regression

Simple Linear Regression

Chapter 12a Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Chapter 11: Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Chapter 2 Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Chapter 11: Simple Linear Regression

Simple Linear Regression