 Download Download Presentation Chapter 15

# Chapter 15

Download Presentation ## Chapter 15

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Chapter 15 Inference for Regression

2. How is this similar to what we have done in the past few chapters? • We have been using statistics to estimate parameters. • We have been using statistics to determine how likely proposed values for the parameters are to be accurate. • We now continue that by using ŷ = a + bx to estimate μy =  + βx

3. The BIG Ideas • 1. The model for regression inference says that the overall relationship between the explanatory and response variables in the population is described by a straight line with slope β and intercept . Individual responses y for different values of the explanatory variable x are independent and deviate from this line according to a Normal distribution with the same standard deviation σ for any x value. • 2. The least-squares regression line estimates the population line. The residuals, deviations of the observations from the least-squares line, combine in the regression standard error to estimate the population standard deviation σ.

4. The BIG Ideas (continued) • 3. Inference about the population slope β is based on the t statistics with n-2 degrees of freedom. These statistics are formed by standardizing statistics such as the least-squares slope b by dividing by their standard errors. • 4. Unlike data analysis, inference is legitimate only under certain conditions. Be sure to verify that the model for regression inference does describe your data.

5. Linear Inference • Using the same principles that we used during the LSRL chapter, we can apply that to inference. • Assumptions (NOTE: WE WILL NOT VERIFY ALL OF THESE AS WE HAVE DONE IN THE PAST) We have n observations on an explanatory x and response y. Our goal is to study the behavior of y for given x values. • THE OBSERVATIONS ARE INDEPENDENT: Repeated y are independent of each other and come from an SRS. • THE TRUE RELATIONSHIP IS LINEAR: The mean response μy has a straight-line relationship with x:μy =  + βx • Slope β and intercept  are unknown parameters. • THE STANDARD DEVIATION OF THE RESPONSE ABOUT THE TRUE LINE IS THE SAME EVERYWHERE: The standard deviation of y (σ) is the same for all values of x. Value of sigma is unknown. • THE RESPONSE VARIES NORMALLY ABOUT THE TRUE REGRESSION LINE: For any value of x, the response y varies according to a normal distribution.

6. Condition Verification • Due to the complexity of truly verifying each condition, we will not address them all individually. • You are expected to analyze the residuals to test for any gross violations of the conditions necessary to do inference for regression. • First, look at the residual plot to see if the spread about the line appears to change as x increases. • Second, graph the residuals in a stemplot to see if they are approximately Normally distributed.

7. Linear Model • ŷ = a + bx • This equation is an unbiased estimator of the real parameter equation.

8. ESTIMATES • We use a as an estimate of  • We use b as an estimate of β • We use s as an estimate of σ

9. Facts we need to know • Standard error about the line • Confidence Interval for β • b ± t*SEb • t = b/SEb • P-value = tcdf (t, E99, df) or tcdf (-E99, t, df) for neg. t • Don’t forget P-value = 2(tcdf (t, E99, df)) or 2(tcdf (-E99, t, df)) when using this method if two-sided alternative is chosen. • Degrees of freedom = (n-2) • H0: β = 0 There is no correlation between x and y. • Ha: β (<,>,≠) 0 There is a (negative, positive, some) correlation between x and y.

10. AN EXAMPLE OFUsing Linear Inference • A statistical output reports the following. SEb= 3.511 r = -.9031 r2 = .8157b = -23.3567 n = 12 • Create a 95% confidence interval for the slope b. • Given that a is 266.2005 create an unbiased linear model to express the data.

11. Answers • The confidence interval is • b ± t*SEb -23.3567 ± 2.228(3.511) -23.3567 ± 7.8225 (-31.1792, -15.5342) We are 95% confident that the mean of y decreases by between about 15.5 to 31.2 for each unit that x increases. • The unbiased model isŷ = 266.2005 – 23.3567x • Don’t forget to define your variables ŷ and x when context is available

12. More Work-Same Example • IS THERE A CORRELATION BETWEEN x and y? • H0: β = 0 There is no correlation between x and y. • Ha: β ≠ 0 There is a correlation between x and y. • t = b/SEb = -23.3567/3.511 = -6.652435204 • P-value = 2(tcdf (-E99, -6.6524, 10))≈ .000057 • There is significant evidence against our null hypothesis. Due to the low P-value we will reject the null hypothesis. Our results indicate that there is a correlation between x and y.

13. AP Exam Notes • It has been suggested that for the purposes of the exam, students should be able to take computer output and make an inference (hypothesis test or confidence intervals) about the slope of a regression line and then interpret results. Students should also be able to get regression inference results from their calculators. In addition, students should be able to interpret the values of r, r2, s, a, b, and SEb in the context of a regression problem. • CAN YOU DO THESE THINGS???

14. Example • Researchers wanted to know if having larger proportions of physicians in developing African countries is associate with a higher average life expectancy. Data for ten African countries were entered into a statistics package, and regression analysis was requested. On the following slides are the results (note that the explanatory variable is population-physician ratio, which is defined as the population divided by the number of physicians).

15. Predictor Coef Stdev t-ratio pConstant 54.948 2.955 18.59 0.000Ratio -0.0004206 0.0001461 * * * * * * s = 6.310 R-sq = 50.9% R-sq(adj) = 44.8% Unusual ObservationsObs. Ratio LifeExp Fit Stdev.Fit Residual St.Residual 2 47889 32.00 34.80 5.21 -2.80 -0.79 X 7 7306 38.30 51.87 2.28 -13.57 -2.31 R R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it a large influence. ANSWER: predicted average life expectancy ratio (population/physician) 1. What is the equation for the least-squares regression line? Define any variables you use.

16. Predictor Coef Stdev t-ratio pConstant 54.948 2.955 18.59 0.000Ratio -0.0004206 0.0001461 * * * * * * s = 6.310 R-sq = 50.9% R-sq(adj) = 44.8% Unusual ObservationsObs. Ratio LifeExp Fit Stdev.Fit Residual St.Residual 2 47889 32.00 34.80 5.21 -2.80 -0.79 X 7 7306 38.30 51.87 2.28 -13.57 -2.31 R R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it a large influence. ANSWER: There is a moderately strong negative linear association between the ratio (population/physician) and average life expectancy in developing African countries. 2. What is the correlation? Interpret the correlation in the context of these data.

17. Predictor Coef Stdev t-ratio pConstant 54.948 2.955 18.59 0.000Ratio -0.0004206 0.0001461 * * * * * * s = 6.310 R-sq = 50.9% R-sq(adj) = 44.8% Unusual ObservationsObs. Ratio LifeExp Fit Stdev.Fit Residual St.Residual 2 47889 32.00 34.80 5.21 -2.80 -0.79 X 7 7306 38.30 51.87 2.28 -13.57 -2.31 R R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it a large influence. 3. Would you be willing to use this model to predict the life expectancy for a country, given the population-physician ratio? Justify your decision. ANSWER: Yes, as the population/physician ratio decreases, you would hope that the average life expectancy increases, resulting in a negative association. The association appears strong enough to be useful.

18. Predictor Coef Stdev t-ratio pConstant 54.948 2.955 18.59 0.000Ratio -0.0004206 0.0001461 * * * * * * s = 6.310 R-sq = 50.9% R-sq(adj) = 44.8% Unusual ObservationsObs. Ratio LifeExp Fit Stdev.Fit Residual St.Residual 2 47889 32.00 34.80 5.21 -2.80 -0.79 X 7 7306 38.30 51.87 2.28 -13.57 -2.31 R R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it a large influence. 4. Is there sufficient evidence to indicate that average life expectancy in developing African countries has a linear association with population-physician ratio? ANSWER: HOW CAN WE ANSWER THIS QUESTION??? USE INFERENCE (completed on the next slide)

19. H0: β= 0 There is no association between average life expectancy and population/physician ratio • Ha: β< 0 There is a negative association between average life expectancy and population/physician ratio • With just the summary statistics available, we can not adequately verify that our conditions have been satisfied. • t=b/SEb = -0.0004206/0.0001461 = -2.88, df = n-2 =8, P-value = tcdf(-E99, -2.88, 8) = 0.0103 • Due to the low P-value, we will reject H0. • There is reasonably strong evidence to suggest that there is a significant negative association between average life expectancy and population/physician ratio.

20. Predictor Coef Stdev t-ratio pConstant 54.948 2.955 18.59 0.000Ratio -0.0004206 0.0001461 * * * * * * s = 6.310 R-sq = 50.9% R-sq(adj) = 44.8% Unusual ObservationsObs. Ratio LifeExp Fit Stdev.Fit Residual St.Residual 2 47889 32.00 34.80 5.21 -2.80 -0.79 X 7 7306 38.30 51.87 2.28 -13.57 -2.31 R R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it a large influence. 5. Construct a 90% confidence interval for the slope of the true regression line. ANSWER: b ± t*SEb = -0.0004206±1.86(0.0001461) = (-0.000692, -0.000149)