Inference for Regression

# Inference for Regression

Télécharger la présentation

## Inference for Regression

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Inference for Regression Section 13.3, Page 284

2. Population Regression Line Secton 13.3, Page 285

3. Standard Error of the Regression Line The σ above is the standard deviation of the regression line. It is estimated by the standard error of the regression lineSe = standard deviation of the residuals Section 13.3, Page 286

4. Inference for Slope and Intercept For a give population of points (x, y) the true regression is given by the equation y=α + βx. When we take a random sample of the points and construction a regression equation In this case a is an estimator of α and b is an estimator of β. When certain conditions are met, the sampling distributions for a and for b are t-distributions, each with n-2 degrees of freedom where n equals the number of points in the sample. t-distribution for a intercept t-distribution for b slope Section 13.4, Page 287

5. Hypotheses Test for the Slope Because we have sampling distribution for a and for b we can test hypotheses about them. The most often test relates to the slope. Recall that if the slope for the regression line is 0, then there is not useful regression line. We therefore most often test the following hypotheses for b: When the null hypotheses is rejected, then we have a useful relationship. How useful will be determined by the coefficient of determination. Section 13.4, Page 288

6. Hypotheses Test for the Y-Intercept The hypotheses test for the slope is most important to see if we have a usable regression. Sometimes, a hypotheses test for the y-intercept is used. In this case: Section 13.2, Page 707

7. Confidence interval for slope and Intercept When conditions have been met, the confidence interval for the slope is: The confidence interval for the y-intercept is: Section 13.5, Page 289

8. Confidence Interval for Regression Equation Predictions When predicting y values for a given x value there are two situations. We want to predict the mean y value for a given x value. We calculate the confidence interval as follows: Notice that the standard error gets larger as x* gets further away from the mean of the x values. When we want to predict a single point y value for a given x*, there is more variability, and the Standard Error is: Section 13.5, Page 291

9. Conditions for Regression Inference • Linearity Assumption: A check of a scatter plot should show a linear pattern. • Independence Assumption: The residuals must be mutually independent. A check of a residuals plotted against the x values show no patters, trends, or clumping. • Equal variance assumption: The variability of y should be be about the same for all values of x. A check of the residuals plotted against the predicted y values should show roughly equal spread. • Normal Population Assumption. We assume that the errors around the idealized regression line form a normal distribution around each x value. We check for this condition by checking all the residuals to see if they came from a normal population. We look at a probability plot of the residuals, or a histogram of the residuals if n is fairly large. Section 13.5

10. Regression Inference Problem (1) Find the regression line for Median Stat as the x-variable and Graduation Rate as the y-variable. Test the hypotheses for the slope, find the 95% interval for the slope, find the predicted mean graduation rate for a Median SAT of 900 and the 95% interval for that prediction. Check the conditions necessary for inference. Section 13.2, Page 708

11. Inference Problem (2) Select the program REGINFER Enter the X and Y lists. Chose YES to Scatter Plot. Note that the scatter plot has a linear pattern Scatter Plot Press ENTER to display the data screen. The y-intercept is -91.3132. The slope is 0.1321. For each one point increase in the SAT Score, the Graduation rate increases by 0.1321 point.The coefficient of correlation is .7589. The coefficient of determination is .5757. The regression model explains 57.57 % of variance in Graduation Rates. The standard error of the regression line is 6.1457 The standard error of the intercept is 31.8968. The standard error of the slope is 0.0314. Data Screen Section 13.5

12. Inference Problem (3) Press ENTER to go to the plot menu. Select 1, the Residuals plotted against the x values Plot Menu There is no pattern, so the linear model is appropriate. Also, the variance of the residuals is about equal along the range of x values satisfying the equal variance assumption. Press ENTER and then select 3=NORMAL PROBABILITY PLOT. Residuals vs x-values The plot pattern is roughly a straight line, indicating that the residuals are from a normal distribution. The normal population assumption is satisfied. Probability Plot Section 13.5,

13. Inference Problem (4) Press ENTER and the inference screen appears. The p-value for the alternative hypothesis of the slope ≠ 0 is 0.001. We reject the null hypotheses and conclude that we have a useful regression. Press ENTER and confidence interval menu appears. Inference Screen Select 1=Yes, and on the next screen choose a .95 confidence level. Press ENTER and the CI screen appears with the intervals for the slope and intercept. CI Menu CI Screen Section 13.5

14. Inference Problem (5) Press ENTER and the prediction menu appears. Select 2=Y-hat mean. On the next screen, enter x = 900. Press ENTER and the prediction screen appears. Prediction Menu The top line shows the x-value and the z-score of the x-value. 900 is 2.2 standard deviation units below the mean of the x-values. Lines 2 and 3 show the minimum and maximum x-values. The predicted y-mean value for x=900 is 27.6028. The 95% confidence interval is shown for the y-mean value prediction. The margin of error of the prediction interval is 8.4079. Press ENTER and then select 3= No Estimates to exit program. Prediction Screen Section 13.5

15. Problems • Find the equation to predict the clutch size from snout size. • What percent of the variance in clutch is explained by the model? • Find the 95% Confidence interval for the slope. • Predict the mean clutch size for a 65 snout size and give the 95% confidence Interval. • Predict an individual clutch size for a snout size of 65 and give the 95% confidence interval. Section 13.5

16. Problems Section 13.5