Chapter 11: Simple Linear Regression

Chapter 11: Simple Linear Regression

Where We’ve Been • Presented methods for estimating and testing population parameters for a single sample • Extended those methods to allow for a comparison of population parameters for multiple samples McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

Where We’re Going • Introduce the straight-line linear regression model as a means of relating one quantitative variable to another quantitative variable • Introduce the correlation coefficient as a means of relating one quantitative variable to another quantitative variable • Assess how well the simple linear regression model fits the sample data • Use the simple linear regression model to predict the value of one variable given the value of another variable McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models The relationship between home runs and runs in baseball seems at first glance to be deterministic … McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models But if you consider how many runners are on base when the home run is hit, or even how often the batter misses a base and is called out, the rigid model becomes more variable. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models General Form of Probabilistic Models y = Deterministic component + Random error where y is the variable of interest, and the mean value of the random error is assumed to be 0: E(y) = Deterministic component. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models • The goal of regression analysis is to find the straight line that comes closest to all of the points in the scatter plot simultaneously. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models • A First-Order Probabilistic Model y = 0 + 1x +  where y = dependent variable x = independent variable 0 + 1x = E(y) = deterministic component  = random error component 0 = y – intercept 1 = slope of the line McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.1: Probabilistic Models 0, the y – intercept, and 1, theslope of the line, are population parameters, and invariably unknown. Regression analysis is designed to estimate these parameters. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach Step 1 Hypothesize the deterministic component of the probabilistic model E(y) = 0 + 1x Step 2 Use sample data to estimate the unknown parameters in the model McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The distances between the scattered dots and the line are the errors of prediction. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The line’s estimated parameters are the values that minimize the sum of the squared errors of prediction, and the method of finding those values is called the method of least squares. The distances between the scattered dots and the line are the errors of prediction. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach • Model: • Estimates: • Deviation: • SSE: McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach • The least squares line is the line that has the following two properties: • The sum of the errors (SE) equals 0. • The sum of squared errors (SSE) is smaller than that for any other straight-line model. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach Can home runs be used to predict errors? Is there a relationship between the number of home runs a team hits and the quality of its fielding? McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.2: Fitting the Model: The Least Squares Approach These results suggest that teams which hit more home runs are (slightly) better fielders (maybe not what we expected). There are, however, only five observations in the sample. It is important to take a closer look at the assumptions we made and the results we got. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.3: Model Assumptions Assumptions 1. The mean of the probability distribution of  is 0. 2. The variance,  2, of the probability distribution of  is constant. 3. The probability distribution of  is normal. 4. The values of  associated with any two values of y are independent. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.3: Model Assumptions • The variance,  2, is used in every test statistic and confidence interval used to evaluate the model. • Invariably,  2 is unknown and must be estimated. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.3: Model Assumptions McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 Note: There may be many different patterns in the scatter plot when there is no linear relationship. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 A critical step in the evaluation of the model is to test whether 1 = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x Positive Relationship No Relationship Negative Relationship 1> 0 1 = 0 1 < 0 McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 H0 : 1 = 0 Ha : 1 ≠ 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x Positive Relationship No Relationship Negative Relationship 1> 0 1 = 0 1 < 0 McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 • The four assumptions described above produce a normal sampling distribution for the slope estimate: called the estimated standard error of the least squares slope estimate. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 • Since the t-value does not lead to rejection of the null hypothesis, we can conclude that • A different set of data may yield different results. • There is a more complicated relationship. • There is no relationship (non-rejection does not lead to this conclusion automatically). McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 • Interpreting p-Values for  • Software packages report two-tailed p-values. • To conduct one-tailed tests of hypotheses, the reported p-values must be adjusted: McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 • A Confidence Interval on 1 where the estimated standard error is and t/2 is based on (n – 2) degrees of freedom McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 • In the home runs and errors example, the estimated 1was -.0521, and the estimated standard error was .569. With 3 degrees of freedom, t = 3.182. • The confidence interval is, therefore, which includes 0, so there may be no relationship between the two variables. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination • The coefficient of correlation,r, is a measure of the strength of the linear relationship between two variables. It is computed as follows: McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination Positive linear relationship No linear relationship Negative linear relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x r → +1 r  0 r → -1 Values of requal to +1 or -1 require each point in the scatter plot to lie on a single straight line. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination • In the example about homeruns and errors, SSxy= -143.8 and SSxx= 2509. • SSyy is computed as so McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination • An r value that close to zero suggests there may not be a linear relationship between the variables, which is consistent with our earlier look at the null hypothesis and the confidence interval on  1. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination • The coefficient of determination, r2, represents the proportion of the total sample variability around the mean of y that is explained by the linear relationship between x and y. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.5: The Coefficients of Correlation and Determination McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.6: Using the Model for Estimation and Prediction McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.6: Using the Model for Estimation and Prediction • Based on our model results, a team that hits 140 home runs is expected to make 99.9 errors: McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.6: Using the Model for Estimation and Prediction • A 95% Prediction Interval for an Individual Team’s Errors McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.6: Using the Model for Estimation and Prediction Prediction intervals for individual new values of y are wider than confidence intervals on the mean of y because of the extra source of error. McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.6: Using the Model for Estimation and Prediction • Estimating y beyond the range of values associated with the observed values of x can lead to large prediction errors. • Beyond the range of observed x values, the relationship may look very different. Estimated relationship True relationship Xi Xj Range of observed values of x McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

9.7: A Complete Example • Step 1 • How does the proximity of a fire house (x) affect the damages (y) from a fire? • y = f(x) • y = 0 +1x +  McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression 11.6:

9.7: A Complete Example McClave/Sincich: A First Course in Statistics, 10th ed. Chapter 9: Simple Linear Regression

Chapter 11: Simple Linear Regression

Chapter 11: Simple Linear Regression

Presentation Transcript

Chapter 12 Simple Linear Regression

Simple Linear Regression

Chapter 11 : Linear Regression

Chapter 12a Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Simple Linear Regression

Simple Linear Regression

Chapter 2 Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple Linear Regression

Simple linear regression

Chapter 11: Simple Linear Regression

Simple Linear Regression