Today’s Lecture Topic

Today’s Lecture Topic • Kruskal-Wallis Test • Rationale behind the test • Setting up the problem • Large Samples/Small Samples • Computing the statistic • Interpreting the results • Correcting for Ties

Reference Material • Burt and Barber, pages 449-453 • Hollander and Wolfe (Non-Parametric Statistics)

Once Again • What if we were interested in comparing the means of multiple samples? • What would be the layout of such a situation? • What if the assumptions of an ANOVA could not be met?

Remember That Multiple Categories Create Problems • If we wanted to run a Wilcoxon Rank Sum on all the potential pairings of a multiple category data set, we would end up having to run k(k-1) tests (with k as the number of categories) • There are problems with this approach • 1st – it is a lot of computational work • 2nd – as we run multiple tests the chance of us committing at least one alpha error is greater than the alpha level for just a single test • But if we have a single test that is sensitive to the variation between categories, we can avoid these pitfalls

Kruskal Wallis Test • The Kruskal Wallis (or H-Statistic) operates in a very similar fashion to the the Wilcoxon or Mann-Whitney Test but it can be extended to observations in multiple categories • Its null hypothesis is that all the categories come from a single population and that there is no difference between them • The mathematical form is H0:τ 1=τ2=τ3=…= τk • As usual, the alternative is that at least one is different

Assumptions and Limitations • Independent Random Samples • Each of the observations is a sample from a continuous distribution • Those distributions differ only via a “treatment effect” that “shifts” the median from distribution to distribution • This test is exceedingly robust and can handle normality and variance issues that would derail an ANOVA

The Basic Idea Behind the Test • If we had categories of data and the data in those categories came from the same population • If we ranked all of the data and then summed the ranks in each category (dividing by n when categories were of differing size) • We would expect that the sum of each category’s ranks would be roughly equal • If they were not equal, then clearly at least one of the categories is from a different population

Working with Data: An Example A Survey of Sexually Active Adults A 15 item test of general knowledge about sex and health was administered to random samples Each member of the survey assigned themselves to one of three categories Sexually Inactive, Active - One Partner, Active - Multiple Partners n=6 for category 2 and 3 and n=5 for category 1, N=17 The data is interval but not ratio and scaled from 0-15

The Test Statistic • The Kruskal Wallis compares the sum of ranks in categories via the statistic H • It does this by summing the ranks and then dividing by the number of observation in the category • Then it sums the average ranks from each category and then applies some constants based upon the sample size R.j is the average rank for the jth category j = 1, 2, …, k categories Rj is the sum of the ranks for all observations from i=1 to n in category j

Equations for H • The original version of the H statistic is as follows: • A simplified version in terms of computation is as follows: The reason I am showing you the original form is so that you can see the similarity between the comparison of category and global means in the ANOVA to the comparison of category mean rank and global median in the KW Test

Determining Significance • For Small Samples, we can utilize the Tables that I am providing • For Large Samples, we can utilize the very popular χ2 distribution (Chi-square) with k-1 degrees of freedom and your desired alpha level • Your text has an unusual χ2 table on page 620 k=3, n1=5, n2 and n3=6 H Stat Table

Comparison of Critical Levels • Your text gives a critical value of 5.99 for H (at 2 degrees of freedom and a 0.05 alpha • The table in my non-parametric text gives a critical value of 5.77 for H (with 3 categories and 5, 6 and 6 samples in each category) • This illustrates the advantage of tables over large sample approximations • That said, the Large Sample Approximation for the H Statistic is no additional work (there is no z-score calculation) and it requires a larger H to reject, thus it is the more conservative approach, so I advocate its use 99% of the time

Calculating the Statistic • N=17 and 17(18)=306, 12/306=0.039216 (do not round this ever) • 3(N+1)=54 • The sum of the summed ranks over n is easiest to demonstrate in excel • The sums for each category are 451.25, 425.04, and 504.17 • The total sum is 1380.46 • 0.039216*1380.46=54.14 • 54.14-54=0.14, so H=0.14 R.j's=9.50, 8.42 and 9.17, so there isn’t much variation

Determining the Result • Since our critical value of 0.14 was 5.77 via the table or 5.99 via the χ2 table, there is no chance of us rejecting the null hypothesis at an alpha of 0.05 • Thus, there is not statistical evidence that sexual activity levels effect individual performance on the survey

What about Ties? • Since we failed to reject, we should look at the ties within the data set • In our data, we had two sets of two ties and one set of three ties, we know from experience with tj’s that a set of two is worth 6 and a set of three is worth 24, so our sum of tj’s is 36 • 17^3-17=4896, 36/4896=0.007353, 1-0.007353=0.99 and 0.14/0.99=0.141, so we still fail to reject with H’=0.141

How Do You Feel About Examples Where We Fail To Reject?

Homework • This weeks homework deals with a random sample of counties that have been rated as predominately urban, suburban or rural • Given data on infant mortality (deaths/1000 live births), you are being asked to resolve the following question: • Does infant mortality rate vary significantly by category? • Conduct the test using a one way ANOVA and a Kruskal Wallis, use an alpha of 0.05 and make all calculations to 2 decimal places (with the exception of the noted constant in the H-Stat and any Tie Corrections)

Today’s Lecture Topic