Nonparametric Statistical Methods Presented by Xiaojin Dong, Owen Gu,SohailKhan, Hao Miao, Shaolan Xiang, Chunmin Han, Yinjin Wu, JiayueZhang, Yuanhao Zhang
Outlines • Wilcoxon signed rank test • Wilcoxon sum rank test • Kolmogorov Smirnov test • Kruskal Wallis test • Kendall’s correlation coefficient • Spearman’s rank correlation coefficient
When to use nonparametric methods? • Population is not normal • Sample size is very small • Some data are ordinal • Samples could be dependent or independent. What to estimate? • Population median. • Median is a better measurement of central tendency for non-normal population, e.g. skewed distributions.
Inventors Henry Berthold Mann 1905-2000 Ohio State University He is the dissertation advisor of Whitney. Donald Ransom Whitney Frank Wilcoxon 1892-1965
Hypothesis testing • vs ( or ) • Compute the differences • Rank in terms of absolute values • Let be the rank of . • Statistics: = sum of the ranks of positive differences = sum of the ranks of negative differences • Reject region: Reject if is large or equivalently if is small, or if
Large sample approximation • vs ( or ) • Statistics for large sample size n: and • Rejection region: Reject if or if
Intuition and assumptions • If positive differences are larger than negative differences, they get higher ranks, thus contributing to larger value of , likewise the larger negative differences contributes to larger value of . • Assumption: Population must be symmetric. • Reason: under the null hypothesis the right skewed population tends to have higher value of and the left skewed population tends to have higher value of .
Example 1.1 • Test the thermostat setting data if the median setting differs from 200 • vs • Conclusion: The population median differs from the design setting of 200 at
SAS codes DATAthemostat; INPUT temp; datalines; 202.2 203.4 … ; PROCUNIVARIATEDATA=themostatloccountmu0=200; TITLE"Wilcoxon signed rank test the themostat"; VAR temp; RUN;
8 SAS outputs (selected results) • Basic Statistical Measures • Location Variability • Mean 201.7700 Std Deviation 2.41019 • Median 201.7500 Variance 5.80900 • Mode . Range 8.30000 • Interquartile Range 2.90000 • Tests for Location: Mu0=200 • Test -Statistic- -----p Value------ • Student's t t 2.322323 Pr > |t| 0.0453 • Sign M 3 Pr >= |M| 0.1094 • Signed Rank S 19.5 Pr >= |S| 0.048
Wilcoxonrank sum test: introduction • Wilcoxon-Mann-Whitney Test is also called Wilcoxon rank sum test and the Mann-Whitney U test. • It was proposed initially by the Irish-born US statistician Frank Wilcoxon in 1945, for equal sample sizes, and extended to arbitrary sample sizes and in other ways by the Austrian-born US mathematician Henry Berthold Mann and the US statistician Donald Ransom Whitney in 1947.
Where to use • When we analyze two independent samples, there are times when the assumptions for using a t-test are not met. (the data are not normally distributed, sample size is small, etc.) Or data values may only represent ordered categories. • If sample sizes are moderate, if there is question concerning distributions or if the data are really ordinal, USE WILCOXON RANK-SUM TEST. • This test only assumes: 1. all the observations from both groups are independent of each other. 2. the responses are ordinal or continuous measurements. 3. the distributions of two groups have similar shape.
Calculation steps • H0: F1=F2 (the distribution of both group are equal) vs. Ha: F1<F2 or Ha: F1>F2 (one r.v. is stochastically larger than the other one) • Put all the data from both groups in increasing order (with special provision for ties), retaining the group identity. • Compute the sum of ranks for each group respectively and denote the sums by w1 and w2. • Compute u1=w1-n1(n1+1)/2 and u2=w2-n2(n2+1)/2. Look up to Table A.11 to determine if reject H0 at significance level α and P-value.
Special treatment • For large samples In the case of n1>10 and n2>10, U is approximately normally distributed with parameters µ=n1n2/2 and σ2=n1n2(N+1)/12. Therefore a large sample z-test can be based on the statistics • For ties Use the midrank when observations from one group are equal to a observation from the other group.
Example • To determine if the order of questions has significant impact on students’ performances in a exam, 20 students are randomly equally divided into 2 groups A and B. Everyone were asked to answer a exam paper . The exam papers for both groups consist of same questions. The questions were ranked from easy to hard in the papers for group A while they were ranked from hard the easy for group B. The scores each student got are as follows. • A: 83, 82, 84, 96, 90, 64, 91, 71, 75, 72 • B: 42, 61, 52, 78, 69, 81, 75, 78, 78, 65
Solution • H0: F1=F2 vs. Ha: F1>F2 • Rank scores of both group in ascending order.
The rank sums are w1=4+7+8+9.5+15+16+17+18+19+20=133.5 w2=1+2+3+5+6+9.5+12+12+12+14=76.5 • Therefore u1=w1-n1(n1+1)/2=133.5-10*11/2=78.5 u2=w2-n2(n2+1)/2=76.5-10*11/2=21.5 • Check that u1+u2=n1*n2=100 • From Table A.11 we find that the P-value is between 0.012 and 0.026. • To compare this with the large sample normal approximation, calculate which yields the P-value≈Φ(-2.12)=0.0170
SAS code and output • Data exam; • Input group $ score @@; • Datalines; • A 83 A 82 A 84 A 96 A 90 A 64 A 91 A 71 A 75 A 72 • B 42 B 61 B 52 B 78 B 69 B 81 B 75 B 78 B 78 B 65 • ; • Proc npar1way data=exam wilcoxon; • Var score; • Class group; • exact wilcoxon; • Run;
AndreyKolmogorov 25 April 1903 – 20 October 1987 • Russian Mathematician • Major advances in the fields of Probability theory, topology, turbulence, classical mechanics and computational complexity. • Gained international recognition in 1922, for constructing Fourier series that diverges almost everywhere. "Every mathematician believes he is ahead over all others. The reason why they don't say this in public, is because they are intelligent people"
Vladimir IvanovichSmirnov June 10, 1887 – February 11, 1974 • Significant contributions in both pure and applied mathematics, as well as the history of mathematics. • Five volume book A Course in Higher Mathematics
KS-Test • Tries to determine if two datasets differ significantly by comparing their distributions • Makes no assumption about the distribution of the data. • This Generality comes at some cost: other parametric tests, i.e. t-test may be more sensitive if the data meets the requirements of the test.
Types of KS-Tests • One Sample: Sample VS Reference probability distribution. i.e. test for normality. Empirical VS standard normal • Two Sample: Test if two samples come from the same distribution
The one sample Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function (ECDF). Given N data points Y1 Y2 ..., YN the ECDF is defined as Where n(i) is the number of points less than Yi This is a step function that increases by 1/N at the value of each data point. We can graph a plot of the empirical distribution function with a cumulative distribution function for a given distribution. The one sample K-S test is based on the maximum distance between these two curves. That is, Where F is the theoretical cumulative distribution function
The two sample K-S test is a variation of this. • Compares two empirical distribution functions • Where E1 and E2 are the empirical distribution functions for the two samples. • More formally, the Kolmogorov-Smirnov two sample test statistic can be defined as follows. • H0: The two samples come from a common distribution. • Ha: The two samples do not come from a common distribution.
Test Statistic: The Kolmogorov-Smirnov two sample test statistic is defined as • Where E1 and E2 are the empirical distribution functions for the two samples. • Critical Region: • The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table at significance level α. • The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives to the two sample K-S test .
Application of Kolmogorov-Smirnov Test • The K-S goodness-of-fit test can be applied in the case of both one sample and two samples. • In the one-sample test, we compare the empirical distribution function of the sample data with the cumulative distribution function of the reference distribution to determine if the sample is drawn from the reference distribution, such as standard normal, lognormal or exponential distribution, etc. • In the two-sample test, we compare the empirical distribution functions of two sets of data to determine if they come from the same distribution. • The following slides exemplify the application of the test in both cases using computing software language MATLAB.
one sample K-S test • Hypothesis Testing: H0: The sample data follows the standard normal distribution (μ=0, σ2=1). Ha: The data does not follow the standard normal distribution. • The sample data, extracted from the daily percentage change in the share price of company XXX, Inc. for the past 19 days, is listed as follows in an ascending order: -4.0% -3.5% -3.0% -2.5% -2.0% -1.5% -1.0% -0.5% 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0% 3.5% 4.0% 4.5% 5.0% • The test statistic is: max(|Fx-Gx|), where Fx is the empirical cdf and Gx is the standard normal cdf .
one sample K-S test (cont’d) • The MATLAB language syntax for the test is: x=-4:0.5:5; [h,p,k,c]=kstest(x,,alpha, type), where (1) x is the sample data set, and the values increase from -4 to 5 in an even increment of 0.5; (2)  means the standard normal distribution is used; (3) alpha is a double and represents the level of significance; (4) type isa string and specifies whether the type of test for the alternative hypothesis is ‘unequal’, ‘larger’ or ‘smaller’, meaning whether the empirical cdf and the cdf of the specified distribution are unequal, the empirical cdf is larger or the empirical cdf is smaller ;
one sample K-S test (cont’d) (5) h = 0 if the test accepts the null hypothesis and 1 if the null hypothesis is rejected; (6) p = the p value of the test; (7) k = the test statistic; (8) c = the critical value, depending on alpha and sample size • We are testing under three different scenarios: a) alpha=0.1 b) alpha=0.05 and c) alpha=0.01; All three scenarios are under the assumption that type=‘unequal’. • Scenario #1: α=0.1 a) MATLAB code: [h,p,k,c]=kstest(x,,0.1,'unequal'); b) Testing result: h=1, p=0.0122, k=0.3542, c=0.2714; So since k>c (or h=1), we reject the null hypothesis at 10% level of significance that the sample data follows the standard normal distribution.
one sample K-S test (cont’d) • Scenario #2: α=0.05 a) MATLAB code: [h,p,k,c]=kstest(x,,0.05,'unequal'); b) Testing result: h=1, p=0.0122, k=0.3542, c=0.3014; So since k>c (or h=1), we also reject the null hypothesis at 5% level of significance that the sample data follows the standard normal distribution. • Scenario #3: α=0.01 a) MATLAB code: [h,p,k,c]=kstest(x,,0.05,'unequal'); b) Testing result: h=0, p=0.0122, k=0.3542, c=0.3612; So since k<c (or h=0), we accept the null hypothesis at 1% level of significance that the sample data follows the standard normal distribution.
two sample K-S test • Hypothesis Testing: H0: The two sets of data have the same distribution. Ha: The two sets of data do not have the same distribution. • The first sample data set X1 is evenly-spaced with values ranging from -2.0 to 1.0, while the numbers in the second set X2 come from a function that generates standard normal random variables with μ=0, σ2=1. The sample sizes of both data sets are 16. values of the two data sets are as follows: X1: -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 X2: -0.1241 1.4897 1.4090 1.4172 0.6715 -1.2075 0.7172 1.6302 0.4889 1.0347 0.7269 -0.3034 0.2939 -0.7873 0.8884 -1.1471
two sample K-S test (cont’d) • The MATLAB language syntax for the test is: X1=-2:0.2:1, meaning the values of X1 go from -2 to 1 with an even increasing space of 0.2; X2=randn(16,1), which is a function that generates 16 normal random variables; [h,p,k]=kstest2(X1,X2,alpha,type), where the definitions of h, p, k, alpha and type are the same as those described in the one-sample test; • We are testing under two different scenarios: a) alpha=0.05 b) alpha=0.1. Both scenarios are under the assumption that type=‘unequal’. • Scenario #1: α=0.05 a) MATLAB code: [h,p,k]=kstest2(X1,X2,0.05,’unequal’); b) Testing result: h=0, p=0.0657, k=0.4375;
one sample K-S test (cont’d) • This function kstest2 does not produce the critical value. However, since h=0, then the test accepts null hypothesis that the two sets of data come from the same distribution at 5% level of significance. • Scenario #2: α=0.1 a) MATLAB code: [h,p,k]=kstest(x,,0.1,'unequal'); b) Testing result: h=1, p=0.0657, k=0.4375; So since h=1 now, the test rejects the null hypothesis that the two sets of data come from the same distribution at 10% level of significance.
4. Inferences on Several Independent Samples—Kruskal-Wallis Test
William Henry Kruskal (Oct 10,1919–Apr 21,2005) Born in New York City Mathematician and statistician President of the Institute of Mathematical Statistics (1971) President of the American Statistical Association (1982)
Wilson Allen Wallis (1912–Oct 12,1998) B.A. in psychology, University of Minnesota Economist and statistician President of the University of Rochester (1962-1982) Secretary of State for Economic, Business, and Agricultural Affairs (1985-1989)
Kruskal-Wallis Test Definition-- Non-parametric test(distribution-free) Compare three or more independent groups of sampled data
Hypothesis • Null hypothes (Ho ): samples from identical populations. • Alternative hypothesis (Ha ): samples from differentpopulations.
Steps 1. Arrange the data of all samples: in a single series in ascending order Note: If we have repeated values, assign ranks to them by averaging their rank position. 2. Ranks of the different samples are separated and summed up as R1 R2 R3,...
Steps 3.Test Statistic: Where, H = Kruskal - Wallis test statistic n = total # of observations in all samples Ri= rank of each sample 4. Rejection Region: We will reject Ho if-- H is greater thanthe chi-square table value => Conclude that the sample comes from a different population.
Example 4.1 • An experiment was done to compare four different methods of teaching the concept of percentage to sixth graders. Experimental units were 28 classes which were randomly assigned to the four methods, seven classes per method. A 45 item test was given to all classes. The average test scores of the classes are summarized in table 4.1. Apply the Kruskal-Wallis test to the test scores data.
The value of the kruskal-wallis test statistic equals Since , therefore the P-value<.005, from which we can conclude there are significant differences among the four teaching methods.
SAS codes • data test; • input methodname $ scores; • cards; • case 14.59 formula 20.27 • case 23.44 • case 25.43 • case 18.15 • … • unitary 36.43 • unitary 37.04 • unitary 29.76 • unitary 33.88 • ; • proc npar1way data=test wilcoxon; • class methodname; • var scores; • run;
SAS output: • Wilcoxon Scores (Rank Sums) for Variable scores • Classified by Variable methodname • Sum of Expected Std Dev Mean • methodname N Scores Under H0 Under H0 Score • case 7 49.00 101.50 18.845498 7.000000 • formula 7 66.50 101.50 18.845498 9.500000 • equation 7 125.50 101.50 18.845498 17.928571 • unitary 7 165.00 101.50 18.845498 23.571429 • Average scores were used for ties. • Kruskal-Wallis Test • Chi-Square 18.1390 • DF 3 • Pr > Chi-Square 0.0004