1 / 92

PSYC 6130

PSYC 6130. Multiple Comparisons. Lecture 17 Summary. Why do multiple comparisons The problem with multiple comparisons Familywise and per-comparison alpha Exploratory data analysis Fisher’s protected t tests Tukey’s HSD test Dunnett’s Test REGWQ Test Games-Howell Test

diza
Télécharger la présentation

PSYC 6130

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSYC 6130 Multiple Comparisons

  2. Lecture 17 Summary • Why do multiple comparisons • The problem with multiple comparisons • Familywise and per-comparison alpha • Exploratory data analysis • Fisher’s protected t tests • Tukey’s HSD test • Dunnett’s Test • REGWQ Test • Games-Howell Test • Planned Comparisons • Bonferroni t or Dunn’s Test • Complex Comparisons (Linear Contrasts) • Scheffé’s Test (an exploratory analysis technique that works for complex comparisons). • Recommendations PSYC 6130, PROF. J. ELDER

  3. Why do multiple comparisons? H0 H1 PSYC 6130, PROF. J. ELDER

  4. Number of Comparisons PSYC 6130, PROF. J. ELDER

  5. Number of Comparisons PSYC 6130, PROF. J. ELDER

  6. Number of Comparisons PSYC 6130, PROF. J. ELDER

  7. Number of Possible Comparisons • In general, for an independent variable with k groups the number of possible comparisons is given by: • In our example, k=3, so the number of possible comparisons is: PSYC 6130, PROF. J. ELDER

  8. The Problem with Multiple Comparisons • Each pairwise comparison we do has a 5% chance of resulting in a type I error (assuming ) . PSYC 6130, PROF. J. ELDER

  9. P(Accept,Accept) = 0.95*0.95 =0.9025 P=0.95 Accept H0 Accept H0 Accept H0 P=0.05 P(Accept,Reject) = 0.95*0.05 =0.0475 Comparison 2 P(Reject,Accept) = 0.05*0.95 =0.0475 P=0.95 Reject H0 Reject H0 Reject H0 P(Reject,Reject) = .05*0.05 =0.0025 P=0.05 The Problem with Multiple Comparisons P=0.95 Comparison 1 P=0.05 PSYC 6130, PROF. J. ELDER

  10. The Problem with Multiple Comparisons • If we do 20 comparisons where all of the null hypotheses are actually true, we have a chance of correctly accepting all true null hypotheses and a 1-0.3585 = 0.6415 chance of making at least one Type I error. • In general, the probability of making at least one Type I error in j comparisons is: • This is called the Experimentwise, or Familywise type I error rate. PSYC 6130, PROF. J. ELDER

  11. Example Suppose we wish to make three comparisons at The probability of making at least one type I error is: . PSYC 6130, PROF. J. ELDER

  12. How to Fix the Problem • One way to fix this problem is to reduce the per comparison alpha rate. • This is the main idea behind the approaches we will discuss. PSYC 6130, PROF. J. ELDER

  13. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 Reality The Trade-Off Your guess H0 H1 tcrit PSYC 6130, PROF. J. ELDER

  14. 0.45 0.4 0.35 0.3 0.25 0.2 =Type I error rate 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130, PROF. J. ELDER

  15. The Trade-Off 0.45 0.4 0.35 0.3 0.25 0.2 =Type II error rate 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 PSYC 6130, PROF. J. ELDER

  16. 0.45 0.4 0.35 0.3 =Power 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130, PROF. J. ELDER

  17. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off =Type I error rate PSYC 6130, PROF. J. ELDER

  18. 0.45 0.4 0.35 0.3 =Power 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 6 8 The Trade-Off PSYC 6130, PROF. J. ELDER

  19. Exploratory Data Analysis • Analyzing data for possible effects without any prior expectations about what effects might be found is called exploratory data analysis. • In this case we want to detect effects when present, but we want to limit our familywise Type I error rate so that it never exceeds a strict threshold (e.g., 0.05). • Such after-the-fact t-tests are called post-hoc comparisons. PSYC 6130, PROF. J. ELDER

  20. Fisher’s Protected t Tests • Idea: only perform t-tests if an ANOVA analysis indicates a significant effect. • If there is absolutely no effect of the independent variable, this will weed out 95% of the possible Type I errors, thus ensuring the Type I error rate for any subsequent post-hoc t-tests will be less than .05. PSYC 6130, PROF. J. ELDER

  21. Fisher’s Protected t Tests • Used when performing exploratory data analysis at a fixed Type I error rate. • Assumptions: • All your data are independent and normally distributed. • Equal variances in each treatment group (homogeneity of variance). • You have performed an ANOVA on your data and found a significant F-ratio at your preferred type I error rate (e.g. at _______). PSYC 6130, PROF. J. ELDER

  22. Fisher’s protected t tests The formula for a standard (pooled variances) t test is: For Fisher’s protected t tests, we replace the term with the MSw term. PSYC 6130, PROF. J. ELDER

  23. Fisher’s Protected t Tests • Conditions of protection: The null hypothesis is completely true (i.e. ) or only one null hypothesis is true (e.g. ). • Conditions of no protection: The null hypothesis is partially true. e.g., • In this case, if you are testing more than one true null hypothesis then your experimentwise type I error rate accumulates as before. PSYC 6130, PROF. J. ELDER

  24. 1. Fisher’s LSD: Degrees of Freedom • Since the estimate of variance is based on all groups in the experiment, the error (denominator) degrees of freedom is: PSYC 6130, PROF. J. ELDER

  25. 1. Fisher’s Least Significant Difference (LSD) 2.1>2.03, therefore, reject H0 and conclude that the mean for group A is significantly different from the mean for group B. PSYC 6130, PROF. J. ELDER

  26. Advantages Very powerful Controls familywise Type I error rate when comparing only three treatment means. Controls familywise Type I error rate when at most one null hypothesis is true. Controls familywise Type I error rate when the complete null hypothesis is true. Available in SPSS Disadvantages Very poor Type I error rate in general. 1. Fisher’s LSD PSYC 6130, PROF. J. ELDER

  27. 1. Fisher’s LSD • Why is it called “Least Significant Difference”? • Suppose sample sizes are equal. PSYC 6130, PROF. J. ELDER

  28. End of Lecture March 18, 2009

  29. 2. Tukey’s Honestly Significant Difference • Fisher’s LSD breaks down for > 3 groups. • Tukey’s HSD works for any number of groups • Key Idea: • Given k groups, consider the smallest and largest means. • Ensure protection against Type I error when comparing these two means. • This is guaranteed to protect against Type I error for the next comparison. PSYC 6130, PROF. J. ELDER

  30. 2. Tukey’s HSD • Tukey’s HSD makes use of the studentized range distribution q, which describes the expected, normalized difference between the max and min observed means amongst k treatments, under the null hypothesis: PSYC 6130, PROF. J. ELDER

  31. 2. Tukey’s HSD • As for Fisher’s LSD, this formula can be reversed to efficiently determine which means are significantly different: PSYC 6130, PROF. J. ELDER

  32. 2. Tukey’s HSD • In Tukey’s HSD, every pairwise difference is compared against this HSD. • Any difference that exceeds the HSD is considered statistically significant. • This guarantee derives from a telescoping form of protection. PSYC 6130, PROF. J. ELDER

  33. 2. Tukey’s HSD • Suppose that you order the k means in ascending order: PSYC 6130, PROF. J. ELDER

  34. Accept H0 Accept H0 Stop! Comparison 2 P(Reject,Accept) = 0.05*0.95 =0.0475 P=0.95 Reject H0 Reject H0 P(Reject,Reject) = 0.05*0.05 =0.0025 P=0.05 Intuition behind Tukey’s HSD P=0.95 Comparison 1 P=0.05 “Telescoping protection” PSYC 6130, PROF. J. ELDER

  35. 2. Tukey’s HSD Test • Maintains at the chosen value regardless of the number of groups or whether the null hypothesis is completely or partially true. • Assumptions • Normality • Homogeneity of variance • Independent, random samples • Roughly equal sample sizes • Most appropriate when tests are post-hoc and/or all possible pairwise comparisons are being performed. PSYC 6130, PROF. J. ELDER

  36. 2. Tukey’s HSD • If the sample sizes are slightly different you can replace n with the harmonic mean of the sample sizes. • k = The number of treatment groups. • ni= The number of elements in treatment group i. PSYC 6130, PROF. J. ELDER

  37. 2. Tukey’s HSD PSYC 6130, PROF. J. ELDER

  38. 2. Tukey’s HSD From Studentized Range Statistic Table: PSYC 6130, PROF. J. ELDER

  39. 2. Tukey’s HSD PSYC 6130, PROF. J. ELDER

  40. 2. Tukey’s HSD 55 54 53 52 51 Mean 50 49 48 47 46 45 A B C D Treatment Group PSYC 6130, PROF. J. ELDER

  41. Advantages Type I error is properly controlled for arbitrary number of groups. Does not require an ANOVA. Available in SPSS Disadvantages Overly conservative (low power) for k=3: better to use Fisher’s LSD. Not appropriate if sample sizes or variances are very different. 2. Tukey’s HSD PSYC 6130, PROF. J. ELDER

  42. 3. Dunnett’s Test • Dunnett’s test was devised for the situation when: • one condition (e.g., the control condition) is to be compared against all other conditions (e.g., the treatment conditions), and • no other pairwise comparisons are required. • Under these conditions, Dunnett’s test is the most powerful test that accurately prevents inflation of Type I error. • Dunnett’s test is available in SPSS PSYC 6130, PROF. J. ELDER

  43. Advantages Useful for comparing each treatment group mean with a control group mean. In this situation, it’s the most powerful test available that does not allow to rise above its preset value. Disadvantage Limited applicability. Requires homogeneity of variance. 3. Dunnett’s Test PSYC 6130, PROF. J. ELDER

  44. 4. REGWQ Test • REGW = Ryan, Einot, Gabriel and Welsh. Q = the studentized range statistic. • More powerful than Tukey’s HSD, but still maintains at the preset value. • Adjusts the critical value separately for each pair of means, depending on how many steps separate each pair when the means are put in order. • Available in SPSS • The test of choice when • k>3 • Dunnett’s test does not apply • Homogeneity of variance applies PSYC 6130, PROF. J. ELDER

  45. Accept H0 Accept H0 Stop! Comparison 2 P(Reject,Accept) = 0.05*0.95 =0.0475 P=0.95 Reject H0 Reject H0 P(Reject,Reject) = 0.05*0.05 =0.0025 P=0.05 Recall Tukey’s HSD This turns out to be stricter than necessary. P=0.95 Comparison 1 P=0.05 “Telescoping protection” PSYC 6130, PROF. J. ELDER

  46. REGWQ • The k means are sorted in ascending order: • Now when we do pairwise comparisons, Instead of basing the critical q value on k, we base it on the number of steps between the two means being compared: PSYC 6130, PROF. J. ELDER

  47. REGWQ PSYC 6130, PROF. J. ELDER

  48. Hours spent sleeping each night Ryan-Einot-Gabriel-Welsch Range Subset Would you describe your life as... N 1 2 VERY STRESSFUL 591 6.89 SOMEWHAT STRESSFUL 590 7.07 NOT AT ALL STRESSFUL 593 7.26 NOT VERY STRESSFUL 597 7.29 Sig. .056 .922 Means for groups in homogeneous subsets are displayed. Alpha = .05. 4. REGWQ Test: Example 1 PSYC 6130, PROF. J. ELDER

  49. MEASURE_1 Ryan-Einot-Gabriel-Welsch Range Subset Category N 1 2 3 Men 30 - 152 1:55:44.16 Men 40 - 219 1:56:21.63 Men 24 & 45 1:57:03.26 1:57:03.26 Men 45 - 177 1:57:12.99 1:57:12.99 Men 25 - 89 1:57:27.04 1:57:27.04 1:57:27.04 Men 35 - 172 1:58:53.89 1:58:53.89 1:58:53.89 Men 50 - 97 2:04:21.63 2:04:21.63 Men 60 - 14 2:06:32.61 2:06:32.61 Men 55 - 50 2:07:30.10 Men 65 - 6 2:23:39.50 Sig. .763 .064 .059 Means for groups in homogeneous subsets are displayed. Alpha = .05. 4. REGWQ Test: Example 2 • Marathon Split Times • Note: due to very different sample sizes, we normally would not choose REGWQ for this dataset. PSYC 6130, PROF. J. ELDER

  50. What do do when variance is not homogeneous

More Related