1 / 48

Significance Testing

Significance Testing. Two questions. A coin is tossed 100 times, giving 65 heads. Is the coin “fair”? A car company states that a certain model averages 32 mpg. A consumer testing organization randomly selects 8 of these models and find the following gas mileages (in mpg):

mandar
Télécharger la présentation

Significance Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Significance Testing

  2. Two questions • A coin is tossed 100 times, giving 65 heads. Is the coin “fair”? • A car company states that a certain model averages 32 mpg. A consumer testing organization randomly selects 8 of these models and find the following gas mileages (in mpg): 29, 33, 30, 30, 28, 31, 34, 29 Do the data support the car company’s claim?

  3. Significance Testing • Investigators perform a study and gather data. The results differ from what they expected. Chance variation plays a role, but is the observed difference due only to chance? • To deal with this question, statisticians use tests of significance.

  4. Null and Alternative Hypotheses • Null hypothesis (H0): the difference between observed and expected values is due to chance alone. • Alternative hypothesis (H1): the observed difference is not due to chance alone. • Coin Example H0: The observed number of heads (65) is not significantly different from the expected number (50). The difference is due to chance variation.

  5. Null Hypothesis • H0 can never be proven. Data can either reject or fail to rejectH0. • The idea of significance testing is to • assumethe null hypothesis • use a test statistic to measure difference between observed data and expected value • determine probability (P-value) of an observation at least as extreme as what was actually observed • reject/fail to reject the null based on P-value smaller P-value  more evidence againstH0

  6. Is the coin fair? • Assume the null hypothesis that 65 heads is due to chance alone. • The expected number of heads is 50. • The SE for the number of heads is 5. • Using the z-value as a test statistic: z = 3 • The probability (P-value) of 65 or more heads is less than 1%. This is extremely unlikely – we reject the null hypothesis.

  7. Testing Significance • P<5% is statistically significant. • P<1% is highly significant. • These cutoffs are standard but somewhat arbitrary. • Main idea: when data are too far from a theory’s prediction, it’s evidence against the theory.

  8. Did the car company lie? • A car company states that a certain model averages 32 mpg. A consumer testing organization randomly selects 8 of these models and find the following gas mileages (in mpg): 29, 33, 30, 30, 28, 31, 34, 29 Do the data support the car company’s claim? • The small sample size requires extra work.

  9. Did the car company lie? • The mean of the data is 30.5 mpg and the SD is 1.94 mpg. • Null Hypothesis:

  10. Did the car company lie? • The mean of the data is 30.5 mpg and the SD is 1.94 mpg. • Null Hypothesis: The observed mean of 30.5 mpg is not significantly less than the expected mean of 32 mpg. The difference is due to chance. • Assuming the null, a box model for the gas mileage measurements has mean

  11. Did the car company lie? • The mean of the data is 30.5 mpg and the SD is 1.94 mpg. • Null Hypothesis: The observed mean of 30.5 mpg is not significantly less than the expected mean of 32 mpg. The difference is due to chance. • Assuming the null, a box model for the gas mileage measurements has mean 32 mpg. • Because the number of measurements is small, the SD of the sample is not a good estimate for the SD of the box

  12. Did the car company lie? • The mean of the data is 30.5 mpg and the SD is 1.94 mpg. • Null Hypothesis: The observed mean of 30.5 mpg is not significantly less than the expected mean of 32 mpg. The difference is due to chance. • Assuming the null, a box model for the gas mileage measurements has mean 32 mpg. • Because the number of measurements is small, the SD of the sample is not a good estimate for the SD of the box; use SD+=

  13. Did the car company lie? • EV for sample avg=

  14. Did the car company lie? • EV for sample avg = 32 mpg • SE for sample avg =

  15. Did the car company lie? • EV for sample avg = 32 mpg • SE for sample avg = 0.73 mpg • With a large number of measurements, the normal curve can be used. In this case, we need a different curve. The test statistic is computed the same way; we label it t to emphasize the context.

  16. Did the car company lie? • EV for sample avg = 32 mpg • SE for sample avg = 0.73 mpg • With a large number of measurements, the normal curve can be used. In this case, we need a different curve. The test statistic is computed the same way; we label it t to emphasize the context.

  17. Student’s t-Test • Instead of the normal curve, we’ll use a so-called Student’s t-curve. • There is an entire family of Student’s curves indexed by degrees of freedom (df). For us, df = sample size – 1.

  18. Student’s t-Test

  19. Student’s t-Test • df = 7 for this sample. • -2.36 < t < -1.89, so 2.5% < P < 5%. • P < 5%  the sample average is statistically significant. • We reject the null hypothesis; the data do not support the company’s claim.

  20. When to use Student’s t-Test • Use the Student’s t-Test if all of the following hold: • data is like draws from a box with unknown SD • the number of observations is small (< 25) • the histogram for the contents of the box resembles the normal curve e.g. standardized test scores, Gauss model for measurement error

  21. When to use Student’s t-Test • The normal curve is commonly used for large samples. • The normal curve can be used for small samples if: • the SD of the box is known, and • the contents of the box follow the normal curve. • For a small sample from a known box whose contents are very different from the normal curve, take a second course in statistics!

  22. Significance for differences • Single sample, paired data sets • E.g. pre-test vs. post-test scores for a single group • Two independent samples, one data set • E.g. SAT scores in 1990 vs. 2000 • Treatment vs. control groups, one data set • E.g. Randomized controlled clinical trials

  23. Single sample, paired data Pre-testPost-test 29 42 35 36 37 40 39 41 40 35 43 47 43 42 H0 : the difference between pre-test and post-test scores is due to chance alone.

  24. Single sample, paired data Pre-testPost-testDifference 29 42 13 35 36 1 37 40 3 39 41 2 40 35 -5 43 47 4 43 42 -1 EV for avg. difference = 0 SD+ Observed avg. difference = 2.43 SE for avg. difference

  25. Single sample, paired data Pre-testPost-testDifference 29 42 13 35 36 1 37 40 3 39 41 2 40 35 -5 43 47 4 43 42 -1 EV for avg. difference = 0 SE for avg. difference Observed avg. difference = 2.43 t df = 6 10 % < P < 25 %

  26. Independent samples, one data set 1989 Pre-test1990 Pre-test 29 33 35 37 37 29 39 39 40 35 43 48 43 25 37 38 64 Mean: 38 Mean: 38.5

  27. Independent samples, one data set • Find SE for each sample mean. 1989: 1990: • H0: the difference between the means is due to chance. • EV for the difference of means is 0 • Combine SEs above into a single SE via . SE for difference of means = • Perform z-test or t-test (with df= # data points – 2). df = (7-1) + (10-1) = 15 P > 25% not statistically significant

  28. Treatment vs. control groups • In a randomized controlled experiment, treatment and control groups are not independent samples. • And experiments usually cannot be modelled as simple random samples. • Nonetheless, for reasonably large experiments, the rule gives a reasonable estimate of SE for the difference

  29. Example: A clinical trial on Vitamin C • 200 subjects, randomly split into two groups of equal size • Treatment group averaged 2.3 colds, SD = 3.1 • Control group averaged 2.6 colds, SD = 2.9 • SE for difference

  30. Example: A clinical trial on Vitamin C • H0: the observed difference in average number of colds between treatment and control groups is due to chance alone. • EV for difference = 0 • The sample is large – use a z-test • P We fail to reject H0.

  31. Two-tailed tests • For the Vitamin C experiment, there are three legitimate alternative hypotheses: H1:the group taking vitamin C got significantly fewer colds H2: the group taking vitamin C got significantly more colds H3: the treatment and control groups got significantly different numbers of colds

  32. Two-tailed tests • In theory, the alternative hypothesis should be formulated prior to data collection. • Deciding which hypothesis to test after looking at the data is called data snooping. • One form of data snooping is checking if the sample average is too big or too small before making the test. • Choosing H3: “sample means are different” is a way to combat this form data snooping. • With this alternative, the P-value is doubled.

  33. Is the die fair? • In 60 rolls of a die the following outcomes are observed: • 4 ones • 6 twos • 17 threes • 16 fours • 8 fives • 9 sixes • Could this be due to chance, or should we look for another explanation (like a loaded die)?

  34. Is the die fair? • Assuming each outcome is equally likely EV for number of a given outcome = 10 SE for number = • z-score for sixes = P = 37% • z-score for threes = P = 0.8% • We need a way to combine all six differences.

  35. Is the die fair? • The test statistic combines the individual differences into one overarching value • In general,

  36. Is the die fair? • To determine the likelihood of a value this extreme in 60 rolls of a die, use a -table. degrees of freedom = terms in sum – 1 • In our example, df= 5. • The table gives the probability of randomly getting a value at least as extreme as what we observed assuming our expected values are correct.

  37. Is the die fair? • The p-value for is 1% < P < 5% (close to 1%) • It is unlikely that the outcomes are due to chance alone.

  38. Goodness of fit • The test above is called a goodness of fit test. The purpose is to test the accuracy of a box model. • The null hypothesis is that our box model accurately describes a given chance process.

  39. test for independence • The table shows the distribution of marital status by sex for person age 25-34 in Wyoming Men Women Never married 31.5% 19.2% Married 60.1% 67.3% Widowed, divorced, separated 8.4% 13.5% Are the distributions significantly different?

  40. test for independence • H0: the distribution of marital status is the same for men as it is for women. • This is the statement that conditional probabilities are the same – that the variables are independent. e.g. P(married | male) = P(married | female) • We can make a test of this hypothesis.

  41. test for independence • To perform the test, we must convert percentages to frequencies. Men Women Never married 4530 Married 86105 Widowed, divorced, separated 1221 Total 143 156 • These are observed frequencies. What are the expected values?

  42. test for independence • Assuming the distributions are the same, we expect the percent of both married men and married women to be the population proportion of married people. Married women = 105 Total = 191 Married men = 86 Percent = 63.88% • EV for married women = (156)(.6388) = 99.7 • EV for married men = (143)(.6388) = 91.3 • We apply this procedure to each category.

  43. test for independence Men Women Total Percent Never married 45 30 75 Married 86 105 191 Widowed, etc. 12 21 33 Observed values Men (143) Women(156)Percent Never married Married Widowed, etc. Expected values

  44. test for independence Men Women Total Percent Never married 45 30 75 25.08% Married 86 105 191 63.88% Widowed, etc. 12 21 33 11.04% Observed values Men (143) Women(156)Percent Never married Married Widowed, etc. Expected values

  45. test for independence Men Women Total Percent Never married 45 30 75 25.08% Married 86 105 191 63.88% Widowed, etc. 12 21 33 11.04% Observed values Men (143) Women(156)Percent Never married 25.08% Married 63.88% Widowed, etc. 11.04% Expected values

  46. test for independence Men Women Total Percent Never married 45 30 75 25.08% Married 86 105 191 63.88% Widowed, etc. 12 21 33 11.04% Observed values Men (143) Women(156)Percent Never married 35.9 39.125.08% Married 91.3 99.763.88% Widowed, etc. 15.8 17.211.04% Expected values

  47. test for independence Men Women Total Percent Never married 45 30 75 25.08% Married 86 105 191 63.88% Widowed, etc. 12 21 33 11.04% Observed values Men (143) Women(156)Percent Never married 35.9 39.125.08% Married 91.3 99.763.88% Widowed, etc. 15.8 17.211.04% Expected values

  48. test for independence • The test statistic is computed as before: 6.8 • Degrees of freedom is not the same. For an m x n table, df = • So in this example, df = • 1% < P < 5% Reject H0.

More Related