1 / 30

Chapter 7

Chapter 7. Inference for Distributions. Confidence Interval Review.

lyneth
Télécharger la présentation

Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7 Inference for Distributions

  2. Confidence Interval Review • By measuring the heights of 62 six-year-old girls selected at random, someone has determined that a 95% confidence interval for the population mean height  is (42.2 inches, 46.1 inches). Answer the following questions with “Yes”, “No”, or “Can’t tell” • Does the population mean lie in the interval (42.2, 46.1) • Does the sample mean lie in the interval (42.2, 46.1) • For a future sample of 62 six-year-old girls, will the sample mean lie in the interval (42.2, 46.1) • Do 95% of the sample data lie in the interval (42.2, 46.1) • For a greater confidence, say 99%, will the confidence interval calculation from the same data produce an interval narrower than (42.2, 46.1)

  3. Introduction • We began our study of data analysis by learning graphical and numerical tools for describing the distribution of a single variable and for comparing several distributions • Our study of the practice of statistical inference begins in the same way • With inference about a single distribution and comparison of two distributions

  4. Preface • Two important aspects of any distribution are its center and spread (week 1 again!) • If the distribution is Normal, we describe its center by the mean  and its spread by the standard deviation  • The previous chapter emphasized the reasoning of tests and confidence intervals • Now we emphasize statistical practice • We no longer assume that population standard deviations are known ( is no longer known)

  5. Section 7.1 Inference for the Mean of a Population

  6. Introduction • So far in all our inference for the population mean, , we have assumed we know . • Note that both CI’s and significant tests depend on  • But usually we don’t know . Now what!? • A sensible idea would be to use s, the sample standard deviation, as an estimate of , the population standard deviation.

  7. Introduction • We know that s changes from sample to sample. • So we are adding some variability into our equations. • Question: How does this affect the distribution of ?

  8. Introduction • We know that if x is normal then: • Where is the standard deviation of the sampling distribution of x-bar. • When we don’t know  we will use instead. This is called the standard error of x-bar • We usually denote it as

  9. Introduction • Now we also know that if x is normal • But if we use the standard error, , does

  10. Introduction • Unfortunately no. But using s does follow a distribution called the t-distribution. • Where n-1 is the degrees of freedom • Notice then that for each sample size there is a different t-distribution.

  11. History of the t distribution • The t distributions were discovered in 1908 by William S. Gosset, a statistician working for the Guinness brewing company • He published under the pen name “Student” because Guinness didn’t want competitors to know that they were gaining an industrial advantage from employing statisticians Brilliant!

  12. Properties of the t-distributions • Symmetric • Mean = 0 • Bell shaped • The smaller the df, the larger the tail area. • The smaller the df, the larger the spread. • This larger spread or variability is due to using the sample standard deviation as an estimate for .

  13. Example Notice that the t-distribution approaches the standard normal as the df increase Source: http://en.wikipedia.org/wiki/Student's_t-distribution

  14. T-table • Since we have a new distribution, we get to learn to use a new table to find areas under the curve and quantiles. • This is table D (back cover) • Notice that the table is not nearly as comprehensive as the standard normal table • So, finding P-values from a t-table is a little different from finding the values from a z-table.

  15. The One-Sample t Confidence Interval • How does using s affect confidence intervals for the mean ? • You will see that the one-sample t confidence interval is similar in both reasoning and computational detail to the z confidence interval of Chapter 6

  16. The One-Sample t Confidence Interval • Suppose that a SRS of size n is drawn from a population with an unknown mean  and unknown standard deviation . A level C confidence interval for  is: • Where t* is the value for the tn-1 density curve with area C between -t* and t*. • The margin of error is • This interval is exact when the population distribution is Normal and approximately correct for large n in other cases.

  17. One-sample t confidence interval The area between the critical values –t* and t* under the t(n-1) curve is C. t(n-1) Curve P=(1-C)/2 P=(1-C)/2 -t* 0 +t*

  18. Case 7.1 • The following data are the amounts of vitamin C, measured in milligrams per 100 grams of blend, for a random sample of size 8 from a production run: 26 31 23 22 11 22 14 31 • We want to find a 95% confidence interval for , the mean vitamin C content of the CSB produced during this run

  19. Answer n = 8 x-bar = 22.5 s = 7.19 From Table D we find t*(7) = 2.365 We are 95% confident that the mean vitamin C content of the CSB for this run is between 16.5 and 28.5 mg/100g.

  20. Answer • In this example we have given the actual interval ( 16.5 , 28.5 ) as our answer • Sometimes, we prefer to report the mean and margin of error: The mean vitamin C content is 22.5 mg/100g with a margin of error of 6.0 mg/100g.

  21. One-sample t test • In tests of significance, as in confidence intervals, we allow for unknown  by using s and replacing z by t • Remember: let n be the sample size • If σ is known and n is large then • If σ is NOT known then

  22. The one-sample t test • Suppose that a SRS of size n is drawn from a population having unknown mean  • To test the hypothesis H0 :  = 0 based on a SRS of size n, compute the one-sample t statistic:

  23. The one-sample t test • The P-value for a test of H0 against • Ha :  > 0 is P(T  t) • Ha :  < 0 is P(T  t) • Ha :   0 is 2P(T  |t|) • These P-values are exact if the population distribution is Normal and are approximately correct for large n in other cases

  24. Case 7.1 continued… • Recall that n = 8, x-bar = 22.50, and s = 7.19 • Suppose we want to test that the mean vitamin C content in the final product is 40 • Hypotheses: H0: µ = 40 Ha: µ ≠ 40

  25. Test statistic: • P-value: • Conclusion: • We reject H0 and conclude that the vitamin C content for this run is below specifications.

  26. Section 7.1 Summary • Significance tests and confidence intervals for the mean µ of a Normal population are based on the sample mean x-bar of a SRS. Because of the Central Limit Theorem, the resulting procedures are approximately correct for other population distributions when the sample is large.

  27. Section 7.1 Summary • The standardized sample mean, or one-sample z statistic, has the N(0,1) distribution. If the standard deviation of x-bar is replaced by the standard error, the one-sample t statistic has the t distribution with n-1 degrees of freedom.

  28. Section 7.1 Summary • There is a t distribution for every positive degrees of freedom k. All are symmetric distributions similar in shape to the Normal distributions. The t(k) distribution approaches the N(0,1) distribution as k increases.

  29. Section 7.1 Summary • A level C confidence interval for the mean µ of a Normal population is where t* is the value for the t(n-1) density curve with area C between –t* and t*. The quantity that is +/- is the margin of error.

  30. Section 7.1 Summary • Significance tests for H0: µ = µ0 are based on the t statistic. P-values or fixed significance levels are computed from the t(n-1) distribution.

More Related