1 / 38

Chapter 7

Chapter 7. Inferences Based on a Single Sample. Parameters and Statistics. A parameter is a numeric characteristic of a population or distribution, usually symbolized by a Greek letter, such as μ , the population mean. Inferential Statistics uses sample information to estimate parameters.

Télécharger la présentation

Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7 Inferences Based on a Single Sample

  2. Parameters and Statistics • A parameter is a numeric characteristic of a population or distribution, usually symbolized by a Greek letter, such as μ, the population mean. • Inferential Statistics uses sample information to estimate parameters. • A Statistic is a number calculated from data. • There are usually statistics that do the same job for samples that the parameters do for populations, such as , the sample mean.

  3. Using Samples for Estimation μ Sample (known statistic) Population (unknown parameter) estimate

  4. The Idea of Estimation • We want to find a way to estimate the population parameters. • We only have information from a sample, available in the form of statistics. • The sample mean, , is an estimator of the population mean, μ. • This is called a “point estimate” because it is one point, or a single value.

  5. Interval Estimation • There is variation in , since it is a random variable calculated from data. • A point estimate doesn’t reveal anything about how much the estimate varies. • An interval estimate gives a range of values that is likely to contain the parameter. • Intervals are often reported in polls, such as “56% ±4% favor candidate A.” This suggests we are not sure it is exactly 56%, but we are quite sure that it is between 52% and 60%. • 56% is the point estimate, whereas (52%, 60%) is the interval estimate.

  6. The Confidence Interval • A confidence interval is a special interval estimate involving a percent, called the confidence level. • The confidence level tells how often, if samples were repeatedly taken, the interval estimate would surround the true parameter. • We can use this notation: (L,U) or (LCL,UCL). • L and U stand for Lower and Upper endpoints. The longer versions, LCL and UCL, stand for “Lower Confidence Limit” and “Upper Confidence Limit.” • This interval is built around the point estimate.

  7. Theory of Confidence Intervals • Alpha (α) represents the probability that when the sample is taken, the calculated CI will miss the parameter. • The confidence level is given by (1-α)×100%, and used to name the interval, so for example, we may have “a 90% CI for μ.” • After sampling, we say that we are, for example, “90% confident that we have captured the true parameter.” (There is no probability at this point. Either we did or we didn’t, but we don’t know.)

  8. How to Calculate CI’s • Many CI’s have the following basic structure: • P ± TS • Where P is the parameter estimate, • T is a “table” value equal to the number of standard deviations needed for the confidence level, • and S is the standard deviation of the estimate. • The quantity TS is also called the “Error Bound” (B) or “Margin of Error.” • The CI should be written as (L,U) where L= P-TS, and U= P+TS. • Don’t forget to convert your P ± TS expression to confidence interval form, including parentheses!

  9. A Confidence Interval for μ • If σ is known, and • the population is normally distributed,or n>30 (so that we can say is approximately normally distiributed), gives the endpoints for a (1- α)100% CI for μ • Note how this corresponds to the P ± TS formula given earlier.

  10. Distribution Details • What is ? • α is the significance level, P(CI will miss) • The subscript on z refers to the upper tail probability, that is, P(Z>z). • To find this value in the table, look up thez-value for a probability of .5-α/2. • Examples

  11. Example: Estimation of µ ( Known) A random sample of 25 items resulted in a sample mean of 50. Construct a 95% confidence interval estimate for  if  = 10.

  12. Confidence Interval Estimates Confidence Intervals Mean Proportion Variance  Known  Unknown

  13. Estimation of m (s unknown) • We now turn to the situation where s is unknown but the sample size is large or the sample population is normal. • Since s is unknown, we use s in its place. • However, without knowing s, we are not able to make use of the z table in building a confidence interval. • Instead, we will use a distribution called t (Student’s t). • The t distribution is symmetric and bell-shaped like the standard normal, and also has a m=0, but s>1, so the shape is flatter in the middle and thicker in the tails.

  14. Normal distribution Student’s t-Distributions: Degrees of Freedom, df: A parameter that identifies each different distribution of Student’s t-distribution. For the methods presented in this chapter, the value of df will be the sample size minus 1, df = n- 1. Student’s t, df = 15 Student’s t, df = 5

  15. Using t • As the previous graph shows, the t distribution has another parameter, called degrees of freedom (df). So this is actually a family of distributions, with different df values. • The higher the df, the closer the t distribution comes to the standard normal. • For our purposes, df=n-1. It is actually related to the denominator in the formula for s2. • There is a t-table in the back of the book. It is different from the z-table, so we have to understand how it works.

  16. The t table • Refer to the table. First you will notice the left-hand column is for df. • When df ≥100, the z-table can be used, because the values will be very close. • This table gives tail probabilities, similar to z(a). However, only a selection of probabilities is given, across the top of the table. • The interior of the table gives the t-values, so it is arranged almost opposite of the z-table. • The notation used for t-values is t(df,a). • Just like z(a), a refers to the upper tail probability.

  17. t-Distribution Showing t(df, a):

  18. Example: Find the value of t(12, 0.025). Portion of t-table

  19. Confidence Intervals • When we build our confidence interval, a refers to the probability in both tails. • This is not the same a used in looking up the distribution! So what we have to look up is actually a/2, because that’s the upper tail probability. • And so we come to the formula for a (1-a)100% CI for m when s is unknown:

  20. Example: A study is conducted to learn how long it takes the typical tax payer to complete his or her federal income tax return. A random sample of 17 income tax filers showed a mean time (in hours) of 7.8 and a standard deviation of 2.3. Find a 95% confidence interval for the true mean time required to complete a federal income tax return. Assume the time to complete the return is normally distributed. Solution: 1. Parameter of Interest: the mean time required to complete a federal income tax return. 2. Confidence Interval Criteria: a. Assumptions: Sampled population assumed normal, s unknown. b. Distribution table value: t will be used. c. Confidence level: 1 - α = 0.95

  21. 3. The Sample Evidence: 4. Calculations: 5. (6.62, 8.98) is the 95% confidence interval for µ.

  22. Confidence Interval for a Proportion • Assumptions • Population Follows Binomial Distribution • Normal Approximation Can Be Used if • does not Include 0 or 1 • Or (older guideline) • Confidence Interval Estimate

  23. Example A random sample of 400 graduates showed 32 went to grad school. Set up a 95% confidence interval estimate for p.

  24. New Method • A new method (Agresti & Coull, 1998) can be used to avoid the problems with extreme p’s. There is no need to check the np or nq values with this method. • Define • Then a (1-α)100% CI for p is given by

  25. Example • In the 2004 presidential election, Ralph Nader had about 0.34% of the vote. Suppose an exit poll was taken to estimate Nader’s share of the vote, with a sample size of 200, and 2 people indicated they voted for Nader. • Note that with the traditional method, so the formula is not valid. • Use the p* method to construct a 95% CI for p.

  26. Choosing CI Formulas

  27. Sample Size Calculation • We may wish to decide upon a sample size so that we can get a confidence interval with a pre-determined width. • This is common in polls, where the margin of error is usually decided in advance. • All CI’s we have seen so far have the form P±B, where B is the margin of error. • We want to fix B in advance.

  28. Sample Size for Estimating µ, σ Known • Suppose X is a random variable with σ=10 and we want a 90% CI to have a Bound, or Margin of Error, of 3. • Use the formula . • Fill in the numbers: • Solve: • This is the minimum sample size, but we need a whole number, so round up to n=31.

  29. Sample Size for Estimating µ, σ Unknown • If σ is unknown, the confidence interval will be calculated using the t distribution, unless n is very large. • But the degrees of freedom depend on n, which we don’t know. • The calculation also depends on s, which we don’t know until after sampling. • We must have an initial guess for s, and then use the normal distribution to approximate the t distribution, since it does not require knowing n.

  30. Example (σ unknown) • A manufacturer needs to be able to estimate the width of a new part to within 2mm with 95% confidence. There is not enough history to know what σ would be, so a pilot study is run by measuring 6 parts, and finding s=3.4mm. • Rounding up to the next whole number gives n=12.

  31. Sample Size for Estimating p, a Population Proportion • With a population proportion, we also have a problem in getting the standard deviation part of the Margin of Error, since it depends on p, the thing we are trying to estimate. • There are two possibilities: • 1) We may have a preliminary guess about p that we can use, or • 2) We can use p=.5 because that maximizes the standard deviation. • The sample size will be calculated from the desired margin of error, or error bound.

  32. Example (proportion) • A pollster wants to do a simple random sample to estimate the proportion of the population favoring an increase in property taxes for school funding. He wants a margin of error of 3%, with 90% confidence. The general belief is that it will be a close election, so an initial value of p=.5 is reasonable. • Rounding up to the next whole number gives n=752.

  33. Misc. Notes • The CI for µ formula using z is also called the “Large Sample” CI. It is valid when σ is known, for any sample size, but it also serves as an approximation of the t formula (using s) when n is large. How large? Many books say n≥30. I recommend making use of the t table up to n=100 since that is how far it goes. Statistical computer programs will always calculate t values, regardless of how large n is, for the σ unknown case.

  34. Misc. Notes • The CI for µ formula using t is also called the “Small Sample” CI, but only because the other one is called “Large Sample.” It is valid for any sample size when σ is unknown and the population is normal. • We do not cover methods for small samples that do not come from a normal population in this course (non-parametric methods).

  35. Misc. Notes • The t table is limited because it does not have a very good selection of probabilities. It also “jumps” in the df column. It is possible to use the “closest” value or interpolate when you can’t find what you need, but a better option is to use the Excel functions, TDIST and TINV. • However, you have to be VERY careful about what Excel is giving you.

  36. Excel’s TDIST function • TDIST takes a t value and returns the tail probability. You can choose one or two tails.

  37. Excel’s TINV Function • The TINV Function takes a two-tailed probability and returns a t-value (just what we need now).

  38. Excel Function Comparison The NORMSINV Function, by contrast, takes a left-tailed probability and returns a z-value. This means you have to enter α/2 and take the negative, or else use 1- α/2 as the argument.

More Related