Understanding Non-Parametric Statistics for AP Statistics

Non-Parametric Statistics A Presentation by Rob McMullen for AP Statistics

end What are Non-Parametric Statistics? Non-parametric statistics are a special form of statistics which help statisticians with a problem occuring in Parametric statistics. In order to understand what non parametric statistics are, it is first necessary to know what parametric statistics are.

end What are Parametric Statistics? In AP statistics, when we refer to a distribution we often make certain assumptions about it that enable us to work with it. One thing that helps us with this is the CLT, which allows us to assume that many sampling distributions are approximately normal. This theorem, the Central Limit Therom, tells us that for any distribution with a mean and variance, the sampling distribution for all samples of a given sample size is approximately normally distributed.

end When are Parametric Statistics not useful? When we do significance tests, we rely on the assumption that the sampling distribution of samples taken follows the t-distribution or the z-distribution, depending on the situation. When this assumption is not true, none of our tests, which are called “parametric statistical inference tests,” are reliable. Everything we have done in AP stats has been in the field of “parametric statistics.”

end Why does lack of normality cause problems? When we calculate the p-value for an inference test, we find the probability that the sample was different due to sampling variability. Basically, we are trying to see if a recorded value occurred by chance and chance alone. When we look for a p-value, we are assuming that all samples of the given sample size are normally distributed around the mean. This is why the test statistic, which is the number of standard deviations away from the population mean the sample mean is, is able to be used. Therefore, without normality, no p-value can be found.

end What are Non-Parametric Statistics? The way in which statisticians deal with this problem of parametric statistics is the field of non-parametric statistics. These are tests that can be done without the assumption of normality, approximate normality, or symmetry. These tests do not require a mean and standard deviation. Since a standard deviation assumes symmetry, it is not useful for many distributions anyway.

end What is different about Non-Parametric Statistics? • Sometimes statisticians use what is called “ordinal” data. This data is obtained by taking the raw data and giving each sample a rank. These ranks are then used to create test statistics. • In parametric statistics, one deals with the median rather than the mean. Since a mean can be easily influenced by outliers or skewness, and we are not assuming normality, a mean no longer makes sense. The median is another judge of location, which makes more sense in a non-parametric test. The median is considered the center of a distribution.

Tests for non-parametric statistics are similar to the tests covered in AP stats, but each is slightly different. There are non-parametric tests which are similar to the parametric tests. The following table shows how some of the tests match up. end

end A N O V A What is an ANOVA? When are ANOVAs useful? How does one carry out an ANOVA?

end A N O V A What is an ANOVA? Since ANOVAs were not covered in AP stats, I will now explain them. An ANOVA is a way to compare multiple sample means to see if they are significantly different. The term comes from a term that describes what the experiment does: ANalysis Of VAriance = ANOVA. An ANOVA looks at the variance between the sample means, and decides if they are significant or not. This can be done to compare two or more samples.

end A N O V A When are ANOVAs useful? An ANOVA can be used when one wants to compare any number of samples. This test be done to see if many samples could have come from the same population. This test can also tell you about the differences between two or more areas. For example, if a survey is conducted in many different towns, you can see if their average responses differ significantly. Similarly, you can take samples of plant growth in different climates, soil, or with different treatments. In all cases, an ANOVA can be used to see if the means vary significantly.

end A N O V A How does one carry out an ANOVA? An ANOVA is conducted by first putting all the samples into one, large sample. The standard deviation of this sample is then found, and called  . Next, the value for the range of variation in sample means is found. If the variation between the means is greater than the range of variation, the null hypothesis is rejected. The range of variation is found by finding  / N½, (N½ is the square-root of N) where N is the number of samples in each sample. The difference between each pair of sample means is then found, which is the variation of the means. If any one of these is greater than the range of variation, then those two means are significantly different from each other. Depending on your goal, this may cause you to reject your null hypothesis.

end EXAMPLE Now that I have explained the background principles of Non-Parametric Statistics, I will now carry out an example of one of the tests. I have chosen the Wilcoxon Rank-Sum Test (also call the Wilcoxon Mann-Whitney Test) because it is the most commonly used test.

The Wilcoxon Rank-Sum Test end The Wilcoxon Rank-Sum Test is used in place of the two-sample t-test when the sampling distributions of the variables being compared are not normal. This test requires two samples of sample size n1 and n2. The test is carried out as follows. Items in green are the steps to the test. Items in white are an example of a real test.

The Wilcoxon Rank-Sum Test end 1: The first step in this procedure is to collect two samples. Sample 1: {3,2,12,9,13,7,9,11,4,5,6} n1=11 Sample 2: {1,8,4,15,12,6,10,14,3,3} n2=10

The Wilcoxon Rank-Sum Test end 2: The Second step is to combine the two samples into one large sample. Simply take all the data values from each sample and make one large group. Make sure to know the original samples, as the data will have to be separated back into its original state later. Combined Sample size: n1+n2 = 10+11 = 21 {3,2,12,9,13,7,9,11,4,5,6} and {1,8,4,15,12,6,10,14,3,3} becomes: {3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3}

The Wilcoxon Rank-Sum Test end 3: Once all the data is in one sample, the data must be put into order by size. The data should go from smallest to largest. {3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3} In order is: {1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15}

The Wilcoxon Rank-Sum Test end 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is the average of the ranks. This step is when the raw data becomes ordinal data, or ranked data. Combined sample in order is: (sample size 21) {1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15} Each data value is ranked 1-21: RANK: RAW DATA:

end RANK: RAW DATA: When two or more data have the same rank, the rank is averaged. Therefore, the data becomes: RANK: RAW DATA:

The Wilcoxon Rank-Sum Test end 5: The data are then put back into their original sampling groups as ranked data. RANK: RAW DATA: Orininal Sample 1: {3,2,12,9,13,7,9,11,4,5,6} Original Sample 2: {1,8,4,15,12,6,10,14,3,3} Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5} Ranked Sample 2: {1,12,6.5,21,17.5,9.5,15,20,4,4,}

The Wilcoxon Rank-Sum Test end 6: The sum of the ranks is taken for each sample. This is the test statistic. Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5} Ranked Sample 2: {1,12,6.5,21,17.5,9.5,15,20,4,4,} Sum of sample 1: 120.5 Sum of sample 2: 110.5

end The Wilcoxon Rank-Sum Test SUMMARY: 1: Two samples are taken. 2: The samples are combined to make one distribution of sample size (n1+n2). 3: The data are put into order, based on size. 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is the average of the ranks. 5: The data are then put back into their original sampling groups as ranked data. 6: The sum of the ranks is taken for each sample. This is the test statistic.

Non-Parametric Statistics This concludes my presentation. Are there any topics which have been covered that are not clear, which you would like to see again? Wilcoxon Rank-Sum Test explanation/example Explanation of an ANOVA Introduction to Non-Parametric Statistics Chart comparing Significance Tests

THANK YOU I would like to thank you for taking the time to view this presentation. If you have any questions regarding this topic, you may email me at Robert_McMullen@BBNS.org. I hope that this has been informational and that you now clearly understand what non-parametric statistics are.

Understanding Non-Parametric Statistics for AP Statistics