1 / 83

What are non-parametric tests

talasi
Télécharger la présentation

What are non-parametric tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Chapter_13 Non-Parametric Tests Field_2005

    2. What are non-parametric tests? They do not make any parametric assumptions about the data such as normality, homogeneity of variances, etc. They are therefore also called 'assumption-free' tests They work on the principle of ranking data. The lowest score receives the rank 1, the next highest score the rank 2, etc., without implying that the intervals between the ranks are equal. Low scores will be represented as low ranks, high scores as high ranks. The analysis is then carried out on the ranks and not on the original scores. 4 tests will be considered here: - Wilcoxon rank-sum test -Kruskal-Wallis test (Mann-Whitney test) - Wilcoxon signed-rank test - Friedman's test

    3. Outlook: Terminology

    4. Wilcoxon rank-sum test and Mann-Whitney Test With these two tests you can compare 2 independent conditions. They are equivalent to an independent t-test: Example: The effect of Ecstasy vs. Alcohol shall be measured, using the Beck Depression Inventory (BDI).

    5. The data: effect of Ecstasy vs. Alcohol

    6. The theory The scores are translated into ranks. The lowest score gets the lowest rank, the next higher score the next higher rank up to the highest rank. If there is no difference in the depression level between Ecstasy and Alcohol, a similar number of low and high ranks should be found in each group. If we add up the ranks, the summed total of ranks in each group should be about the same. If there is a difference between the two groups, e.g., Ecstasy produces higher levels of depression, one would find higher ranks in the Ecstasy group and lower ranks in the Alcohol group.

    7. Ranking of the data (Sunday and Wednesday)?

    8. The test statistics, mean and SE (Wilcoxon rank-sum test) Lower sum of ranks for Wednesday : WS = 59 (WS= Wilcoxon sum)? Lower sum of ranks for Sunday: WS = 90.5 Mean of the test statistics (mean of Wilcoxon sum, WS): __ WS = n1(n1+n2+1) = 10(10+10+1) = 105 2 2 SE of the test statistics (SE of ?WS): SEWS = ?n1 n2(n1+n2+1)/12 = ???10x10)(10+10+1)/12 = 13.23

    9. Test statistic as z-score, significance _ __ Z = X-?X = WS - WS s SEWS __ zSunday = WS - WS = 90.5 105 = -1.10 ns SEWS 13.23 __ zWednesday = WS - WS = 59 105 = -3.48* SEWS 13.23 If the z-scores are > 1.96 (irrespective of + or -), then the test is significant. ? The group difference for Sunday is n.s., whereas ? The group difference for Wednesday is significant *

    10. Mann-Whitney (U) test The Mann-Whitney test is similar to the Wilcoxon rank-sum test but uses the U test statistic. U = N1N2 + N1(N1+1) - R1 2 USunday = (10x10) + 10(11) - 119.5 = 35.50 2 UWednesday = (10x10) + 10(11) -151.0 = 4.00 2 SPSS produces both statistics. Since they are related they always say the same. Choose yourself!

    11. Data input : Ecstacy_Alc and provisional analysis For a between subjects test, we need a coding variable (as in a between subjects t-test), e.g. 'drug': 1=ecstasy; 2=alcohol We then have a column for the dependent variable BDI on Sunday (sunbdi) and one for BDI on Wednesday (wedbdi).

    12. Before running the Analysis: Run a test of normality Analyze ? Descriptive Statistics ? Explore Sunbdi and wedbdi go to the dependent list; 'drug administered' goes to the factor list In the plots, tick 'test of normality' for the test of normality

    13. Test of Normality Both the K-S test and the Shapiro-Wilk test tell us that the distribution for Ecstasy-sunbdi and Alcohol-wedbdi are not normal.

    14. Decision for a non-parametric test As we have seen, some of the distributions are non-normal. What can you do? Transform the data (z-, logarithmic, etc.)? Choose a non-parametric test

    15. Homogeneity of variances Levene's test is n.s. ? the variances of the Sunday and Wednesday data are equal

    16. Further Descriptives Analyze ? Descriptive Statistics ? Frequencies or ? Descriptives Request basic descriptive statistics such as the mean, median, SD, variance. Note that for a non-parametric test, the median is a better indicator of the central tendency than the mean.

    17. Running the analysis (using your own Ecstasy_Alc.sav)? Analyze ? Nonparametric Test ? 2 independent samples

    18. Specifying the dialog boxes

    19. Exact Test You may or may not have 'Exact...' in your Main Dialog Box (I haven't). The 'Ecact Test' is an extra module of SPSS which needs to be installed. It enables an Exact test of the significance of the Kruskal-Wallis test, which is a good thing to have for small samples. However, it is a very time-demanding procedure (can take really long...). Instead of an 'Exact Test', a less intense test can be requested based on the 'Monte Carlo' Method. In the Monte-Carlo-Method, a distribution similar to the sample is found and then many samples (up to 10.000) are created for which the mean significance value and Confidence Intervals are computed.

    20. Other options for the Mann-Whitney test Kolmogorov-Smirnov Z: The K-S Z-test test whether two samples have been drawn from the same population. In sofar, it does the same as the M-W test. The K-S Z-test has even better power for small samples (n<25). Moses Extreme Reactions: This test compares the variability of scores in the two groups, hence like a non-parametric Levene test. Wald-Wolfowitz runs: This is a variant of the M-W-test which looks for 'runs' of scores in a row from the same group: AAAAAAAAAAAAEEEEAEEEEEEEE If the groups are different, runs or ranks for each group should cluster at different ends of the distribution.

    21. Output from Mann-Whitney Test

    22. Test statistics of the M-W test

    23. Comparison with t-test for independent samples ? Wilcoxon rank sum test and t-test yield the same results

    24. Calculating the effect sizes The effect size r can easily be calculated from the z-scores. r = z ?n rSunday = -1.11 = -.25 ??? rWednesday = -3.48 = -.78 ???

    25. Reporting the results (Field_2005_532)? Ecstasy users (Mdn=17.5) didn't seem to differ in depression levels from alcohol users (Mdn=16) the day after the drugs were taken, U=35.5, ns, r=-25. However, by Wednesday, ecstasy users (Mdn=33.5) were significantly more depressed than alcohol users (Mdn=7.5), U=4, p<.001, r=-.78.

    26. Non-parametric tests and statistical power With a non-parametric test we avoid the assumptions of a parametric test, esp. normality. However, by ranking the scores rather than computing the scores directly, we lose information about the magnitude of the difference between the scores (remember, two ranks do not tell you anymore how far, numerically, the two original scores were apart). Therefore, we may lose statistical power, i.e., we may not detect an effect which is genuinely there. However, non-parametric tests are only less powerful if parametric assumptions are met. Thus, if you run a parametric and a non-parametric test over normally-distributed data, then the non-parametric test may be weaker.

    27. Non-parametric tests and statistical power The problem: For normally-distributed data Type 1-error rate is 5%. For non-normally-distributed data we would not know where 5% of the non-normal distribution are. It depends on the shape of the distribution.

    28. Terminology

    29. Comparing two related conditions: The Wilcoxon Signed-rank test The Wilcoxon signed-rank test is used when you want to compare 2 conditions but within the same subject. It is the non-parametric equivalent to the dependent t-test. Expl: Measuring the differences between the depression scores on Sunday and Wednesday, from the previous example. Note: before, we had only tested the difference between the two groups of Ecstasy and Alcohol users (between subjects design).

    30. The theory First, the differences between the scores in the two conditions are obtained, then they are ranked. Additionally, the sign (positive/negative) of the difference is assigned to the rank.

    31. Ranking the data

    32. Calculating significance General formulas: ? T= n(n+1) = Test statistics 4 SET = ?n(n+1)(2n+1)/24 = SE of Test statistics

    33. Calculating significance ?T= n(n+1)/4 = Test statistic SET = ?n(n+1)(2n+1)/24 = SE of Test statistics ?TEcstasy = 8(8+1)/4 = 18 SET Ecstasy = ??(8+1)(16+1)/24 = 7.14 ?TAlcohol = 10(10+1)/4 = 27.5 SET Alcohol = ???(10+1)(20+1)/24 = 9.81

    34. z-scores Z = X-?X = T - ?T s SET zEcstasy = 0-18 = -2.52* 7.14 zAlcohol = 8-27.5 = -1.99* 9.81 ? Both values are <1.96 (5% level), hence there is a significant difference in depression scores between Sunday and Wednesday for both drugs, Ecstasy and Alcohol

    35. Before running the analysis (using your own Ecstasy_Alc.sav)? Before running the analysis, you have to split the file for the Ecstasy and the Alcohol group Data ? Split File ? Organize output by groups

    36. Running Wilcoxon signed-rank test (using your own Ecstasy_Alc.sav)? Analyze ? Non-parametric tets ? 2-related samples

    37. Alternatives for Wilcoxon signed-rank test In the main dialog window, there are 3 alternative tests which you may choose instead of Wilcoxon: 1. Sign: It only considers the direction of the differences (pos or neg), irrespective of magnitude of change. Therefore, it looses power. 2. McNemar: Good for nominal (not ordinal) data, i.e., two related dichotomous variables. 3. Marginal Homogeneity: Extension of the McNemar for ordinal data. Equivalent to Wilcoxon. (My version of SPSS does not have this option)?

    38. Aside: Request Descriptive Statistics Analyze ? Descriptive Statistics ? Frequencies Before, split the files according to 'kind of drug'!

    39. You can also request the medians from the descriptive statistics of the signed-rank test by clicking on: quartiles in the options

    40. Output for Ecstacy SPSS first gives the results for the Ecstasy group

    41. Output for Alcohol SPSS then gives the results for the Alcohol group

    42. Effects of Ecstasy vs. Alcohol For Ecstasy, depression increases from Sunday to Wednesday. For Alcohol, depression decreases from Sunday to Wednesday. This reverse effect is an interaction!

    43. Calculating the effect sizes Effect sizes for the Wilcoxon signed-rank test can be calculated from the z-scores: r = z ?n rEcstasy = -2.53 = -.57 ??? rAlcohol = 1.99 = -.44 ???

    44. Reporting the results (Field_2005_541)? For Ecstasy users, depression levels were significantly higher on Wednesday (Mdn=33.5) than on Sunday (Mdn=17.50), T=0,p<.05, r=-.57). For Alcohol users, the opposite was true: depression levels were significantly lower on Wednesday (Mdn=7.5) than on Sunday (Mdn=16), T=8, p<.05, r=-.44)?

    45. Terminology

    46. Differences between several independent groups: The Kruskal-Wallis Test The Kruskal-Wallis Test is the non-parametric equivalent to a Simple One-way independent ANOVA. Example: Background: It has been claimed that the chemical 'genistein' which naturally occurs in soya products decreases the number of sperms in males. Research question: Do groups of male subject who eat various amounts of soya meals per week have different amounts of sperm after a year's period ?

    47. The variables Independent variable: number of soya meals (1) no soya meals (control condition) 0 per year (2) 1 soya meal per week - 52 per year (3) 4 soya meals per week - 208 per year (4) 7 soya meals per week - 364 per year Each group consisted of 20 different male individuals. Dependent variable: number of sperms

    48. The Theory of the Kruskal-Wallis Test As the other non-parametric tests, the K-W Test is also based on ranked data. First, the scores are ranked,irrespective of group memebership. Then, for each group, their ranks are added. The sum of ranks for each group is Ri.

    49. Ranked data for the soya experiment

    50. The Test Statistic H k H = 12 Si=1 R2i - 3 (N+1) N(N+1) ni H = 12 9272 + 8832 + 8832 + 5472 - 3(81) = 80(81) 20 20 20 20 = 12 (42,966.45 + 38,984.45 + 38,984.45 + 14,960.45) -243 6480 = 0.0019 (135,895.8) 243 = 251.66 243 = 8.659

    51. Data input As for a One-Way ANOVA, we code the different groups with a dummy coding variable 'Soya' in the 1st column (1) no soya (2) 1 soya meal (3) 4 soya meals (4) 7 soya meals The dependent variable 'sperm' goes in the 2nd column

    52. The data in SPSS (Soya.sav)?

    53. Exploratory analyses Analyze ? Descriptive Statistics ? Explore tick 'Test of Normality' in 'Statistics'

    54. Running the Kruskal-Wallis test Analyze ? Nonparametric Tests ? K-Independent Samples...

    55. Output Kruskal-Wallis Test ? Number of soya meals has a significant effect on sperm count, overall. However, we do not know where the difference is exactly located.

    56. Boxplots for the 4 groups Graphs ? Boxplots Visual inspection: The Medians for groups 1-3 seem rather similar; however, the Median for group 4 seems somewhat lower

    57. 1. Posthoc Tests for Kruskal-Wallis 1. Posthoc tests in nonparametric tests can be done with the Mann-Whitney test (for pairs of unrelated samples). If we want to do Posthoc tests, we risk inflating Type I error. In order to correct for family-wise error inflation, we may use the Bonferroni correction. However, then we loose power. 2. Posthoc tests in nonparametric tests can be done by hand

    58. 1. Posthoc Tests for Kruskal-Wallis ? Compromise: do only a few promising comparisons, e.g. Each level against the control condition (as in 'simple' contrasts)? Test 1: no soya vs. 1 soya meal Test 2: no soya vs. 4 soya meals Test 3: no soya vs. 7 soya meals With 3 tests, we have to divide our ?-level by 3, .05/3 = .0167 So we are doing our Posthoc tests on this more rigorous level.

    59. 1. Single Mann-Whitney tests for the three comparisons

    60. 1. Output of the Single Mann-Whitney tests for the three comparisons

    61. 2. Posthoc Tests in nonparametric tests (for nerds)? You can also calculate the differences for all pairs of contrasts by hand. You take the difference between the mean ranks of the different groups and compare them to a value based on the value of z (corrected for the number of comparisons you make) and a constant based on the total sample size and the sample size in the 2 groups being compared. ??Ru - ?Rv??z??k(k-1) ? N(N+1) /12 ((1/nu) + 1/nv))?

    62. Determining the critical difference for z ??Ru - ?Rv??z??k(k-1) ? N(N+1) /12 ((1/nu) + 1/nv))? In order to know the value for z??k(k-1) , we need to determine the ??level. Normally, it is .05. This level needs to be divided by 12 which is k(k-1) where k is the number of groups, that is, 4x3=12. The ??level therefore is .05/12 = .00417. Now, z??k(k-1) means 'the value of z for which only .00417 other values of z are bigger'. Looking up in Appendix A.1 (normal z-distribution) the smaller portion for .00417 (actually, .00145), we find the value of z=2.64. This is the crit value.

    63. Determining the critical difference for z ??Ru - ?Rv??z??k(k-1) ? N(N+1) /12 ((1/nu) + 1/nv))? crit. Diff = 2.64 ? (80(80+1)/12) (1/20 + 1/20)? crit. Diff = 2.64 ? 540(0.1) crit. Diff = 2.64 ? 54 crit. Diff = 19.4 Since sample sizes for all groups are identical, this value holds for all comparisons. We now can test the actual differences in mean ranks for all comparisons against this critical difference. If a value is bigger, then the comparison is significant.

    64. Testing individual differences in mean rank against the critical difference (19.4)? According to this calculation, none of the differences is significant! However, in the previous Mann-Whitney test the 'No meals 7 meals' had been significant. How come?

    65. Significant or ns comparisons? In the old calculation we had to divide our overall ? level into 3 portions. In the old M-W test we had only conducted 3 comparisons which yields a corrected ? of .05/3=.0167. The ? of the 'no vs 7 meals' comparison had been .009 which is smaller than .0167. In the new comparison, however, we have an ? of .05/6 (for all 6 comparisons) = .0083. Now .009 > .0083, hence the comparison is n.s. ? Better carry out only a few reasonable comparisons

    66. Testing for trends: the Jonckheere-Terpstra test This test looks at the differences between the medians of the groups, just as the Kruskall-Wallis test does. Additionally, it includes information about whether the medians are ordered. In our example, we predict an order for the number of sperms in the 4 groups, indeed: no meal > 1 meal > 4 meals > 7 meals In the coding variable, we have already encoded the order which we expect (1>2>3>4)?

    67. Output of the J-T test

    68. Calculating effect sizes Calculate only effect sizes for single focused comparisons: r = z ?2n rNoSoya 1 meal = -0.243/?40 = -.04 rNoSoya 4 meal = -0.325/?40 = -.05 rNoSoya 7 meal = -2.597/?40 = -.41 rJonckheere = -2.47/??0 = -.28

    69. Reporting the results of the Kruskal-Wallis Test (Field_2005_556)? Sperm counts were significantly affected by eating soya meals (H(3) = 8.66, p < .05). Mann-Whitney Tests were used to follow up this finding. A Bonferroni corrrection was applied and so all effects are reported at a .0167 level of significance. It appeared that sperm counts were no different when one soya meal (U=191, r=-.04) or four soya meals (U=188, r= -.05) were eaten per week compared to none. However, when seven soya meals were eaten per week, sperm counts were significantly lower than when no soya was eaten (U=104, r=-.41).

    70. Terminology

    71. Differences between several related groups: Friedman's ANOVA Friedman's ANOVA is the non-parametric analogue to a repeated measure ANOVA (see chapter 11) where the same subjects have been subjected to various conditions. Example here: Testing the effect of a new diet called 'Andikins diet' on n=10 women. Their weight (in kg) was tested 3 times: Start Month 1 Month 2 Would they loose weight in the course of the diet?

    72. Theory of Friedman's ANOVA Subject's weight on each of the 3 dates is listed in a separate column. Then ranks for the 3 dates are determined and listed in separate columns. Then, the ranks are summed up for each Condition (Ri)?

    73. The Test statistic Fr From the sum of ranks for each group, the test statistic Fr is derived: k Fr = 12/Nk (k+1) Si=1 R2i - 3N(k+1)? = (12/(10x3)(3+1)) (192 + 202 + 212)) (3x10)(3+1)? =12/120 (361+400+441) 120 =0.1 (1202) 120 =120.2 - 120 = 0.2

    74. Data Input and provisional analysis (using) diet.sav First, test for normality: Analyze ? Descriptive Statistics ? Explore, tick 'Normality plots with tests' in the 'Plots' window

    75. Running Friedman's ANOVA Analyze ? Non-parametric Tests ? K Related Samples...

    76. Other options Kendall's W: Similar to Friedman's ANOVA, but looks specifically at agreement between raters. For example: to what extent (from 0-1) women rate Justin Timberlake, David Beckham, or Tony Blair on their attractiveness. This is like a correlation coefficient. Cochran's Q: This is an extension of NcNemar's test. It is like a Friedman's test for dichotomous data. For example, if women should judge whether they would like to kiss Justin Timberlake, David Beckham, or Tony Blair and they could only answer: Yes or No.

    77. Output from Friedman's ANOVA

    78. Posthoc tests for Friedman's ANOVA Wilcoxon signed-rank tests but correcting for the numbers of tests we do, here ? = .05/3=.0167.

    79. Posthoc tests for Friedman's ANOVA - calculation by hand We take the difference between the mean ranks of the different groups and compare them to a value based on the value of z (corrected for the # of comparions) and a constant based on the total sample size (n=10) and the # of conditions (k=3)? ??Ru - ?Rv??z??k(k-1) ? k(k+1)/6N z??k(k-1) = .05/3(3-1) = .00833 If the difference is significant, it should have a higher value than the value of z for which only .00833 other values of z are bigger. As before, we look in the Appendix A.1 under the column Smaller Portion. The number corresponding to .00833 is the critical value: it is between 2.39 and 2.4.

    80. Calculating the critical differences Critical difference = z??k(k-1) ? k(k+1)/6N crit. Diff = 2.4 ? (3(3+1)/6x10 crit. Diff = 2.4 ? 12/60 crit. Diff = 2.4 ? 0.2 crit. Diff = 1.07 ? If the differences between mean ranks are ? the critical difference 1.07, then that difference is significant.

    81. Calculating the differences between mean ranks for diet data ? None of the differences is ? the critical difference 1.07, hence none of the comparisons is significant.

    82. Calculating the effect size Again, we will only calculate the effect sizes for single comparisons:

    83. Reporting the results of Friedman's ANOVA (Field_2005_566)? The weight of participants did not significantly change over the 2 months of the diet (?2(2) = 0.20, p > .05). Wilcoxon tests were used to follow up on this finding. A Bonferroni correction was applied and so all effects are reported at a .0167 level of significance. It appeared that weight didn't significantly change from the start of the diet to 1 month, T=27, r=-.01, from the start of the diet to 2 months, T=25, r=-.06, or from 1 month to 2 months, T=26,r=-0.3. We can conclude that the Andikinds diet (...) is a complete failure.

    84. Summary: Terminology

More Related