Wilcoxon Rank Sum Test: Comparing Population Locations

Ch 19 實習(1)

Introduction • 無母數統計分析(Non-parametric Statistics) • 母體分配未知，且樣本數不是很大 • 優點： • 對母體的假設少，不需要假設母體是什麼分配 • 對小樣本的資料，或是有偏斜分配的母體做推論比較合適 • 可以分析類別資料或順序資料 • 缺點： • 檢定力(1-β)較弱 • 對某些較複雜的模式如有交互作用的多因子設計無法做檢定 • 處理方式不一

Introduction • The statistical techniques introduced in this chapter deal with ordinal data. • We test to determine whether the population locations differ. • In testing the locations we will not refer to any parameter, thus the procedure’s name.

Introduction • When comparing two populations the hypotheses generally are: H0: The population locations are the same H1: (i) The locations differ, or (ii) Population 1 is located to the right (left) of population 2 The random variable X1 is generally larger (smaller) than X2.

Wilcoxon Rank Sum Test(Mann- Whitney Test) • The problem characteristics of this test are: • The problem objective is to compare two populations. • The data are either ordinal or interval (but not normal). • The samples are independent.

Wilcoxon Rank Sum Test – Example • Example 19.1 • Based on the two samples shown below, can we infer at 5% significance level that the location of population 1 is to the left of the location of population 2? • Sample 1: 22, 23, 20; Sample 2: 18, 27, 26;The hypotheses are:H0: The two population locations are the same. H1: The location of population 1 is to the left of the location of population 2.

Sum of ranks = 41 Sum of ranks = 37 1 2 3 4 5 6 7 8 9 10 11 12 Graphical DemonstrationWhy use the sum of ranks to test locations? If the locations of the two populations are about the same, (the null hypothesis is true) we would expect the ranks to be evenly spread between the samples. In this case the sum of ranks for the two samples will be close to one another. Two hypothetical populations and their corresponding samples are presented, the GREEN population and the PURPLE population. Populations Let us rank the observations of the two samples together

Graphical DemonstrationWhy use the sum of ranks to test locations? Allow the GREENpopulation to shift to the left of the PURPLEpopulation.

Sum of ranks = 41 Sum of ranks = 37 Sum of ranks = 40 Sum of ranks = 38 Sum of ranks = 33 Sum of ranks = 45 2 1 2 3 4 5 6 6 7 7 8 9 9 10 11 12 Graphical DemonstrationWhy use the sum of ranks to test locations? The green sample is expected to shift to the left too. As a result, several observations exchange location. What happens to the sum of ranks? Click.

Sum of ranks = 41 Sum of ranks = 37 Sum of ranks = 40 Sum of ranks = 38 Sum of ranks = 33 Sum of ranks = 45 Graphical DemonstrationWhy use the sum of ranks to test locations? 1 3 4 5 8 10 11 12 9 2 6 7 The “green” sum decreases , and the “purple” sum increases. Changing the relative location of two populations affect the sum of ranks of the two samples combined.

Sample 1 22 23 20 Sample 2 18 27 26 3 4 2 1 6 5 Rank Rank 2. Calculate the sum of ranks: 9 2. Calculate the sum of ranks:12 Wilcoxon Rank Sum Test – Example • Example 19.1 – continued • Test statistic1. Rank all the six observations (1 for the smallest). 3. Let T = 9 be the test statistic (We arbitrarily define the test statistic as the rank sum of sample 1.

Wilcoxon Rank Sum Test – Rationale • Example 19.1 - continued • If T is sufficiently small then most of the smaller observations are located in population 1. Reject the null hypothesis. • Question: How small is sufficiently small? • We need to look at the distribution of T.

The distribution of T under H0for two samples of size 3 This sample received the ranks 1, 2, 3 This sample received the ranks 3, 4, 5 .15 2,3,4 2,3,5 2,4,5 3,4,5 .10 1,3,4 1,3,5 1,4,5 2,3,6 2,4,6 3,4,6 .05 1,2,3 1,2,4 1,2,5 1,2,6 1,3,6 1,4,6 1,5,6 2,5,6 3,5,6 4,5,6 T 6 7 8 9 10 11 12 13 14 15 T is the rank sum of a sample of size 3. If H0 is true (the two populations have the same location), each ranking is equally likely, and each possible value of T has the same probability = 1/20

The distribution of T under H0for two samples of size 3 .15 2,3,4 2,3,5 2,4,5 3,4,5 .10 1,3,4 1,3,5 1,4,5 2,3,6 2,4,6 3,4,6 .05 1,2,3 1,2,4 1,2,5 1,2,6 1,3,6 1,4,6 1,5,6 2,5,6 3,5,6 4,5,6 T 6 7 8 9 10 11 12 13 14 15 The significance level is 5%, and under H0 P(T £ 6) = .05. Thus, the critical value of T is 6.

Wilcoxon Rank Sum Test – Example • Example 19.1 - continued Conclusion H0 is rejected if T£6. Since T = 9, there is insufficient evidence to conclude that population 1 is located to the left of population 2, at the 5% significance level.

Critical values of the Wilcoxon Rank Sum Test a = .025 for one tail test, or a = .05 for two tail test TL TU TL TU TL TU TL TU 11 25 Using the table: For given two samples of sizes n1 and n2,P(T≦TL)=P(T≧TU)= a. A similar table exists for a = .05 (one tail test) and a = .10 (two tail test)

n1(n1 + n2 + 1) 2 E(T) = Wilcoxon rank sum test for samples where n > 10 • The test statistic is approximately normally distributed with the following parameters: (n1,n2>10) Therefore, Z = T - E(T) sT

Example 1 • Use the Wilcoxon rank sum test on the following data to determine the two population locations differ. (Use a 10% significance level.) • Sample 1: 15 7 22 20 32 18 26 17 23 30 • Sample 2: 8 27 17 25 20 16 21 17 10 18

Solution 1 • H0: The two population locations are the same H1: The location of population 1 is different from the location of population 2 There is not enough evidence to infer that the location of population 1 is different from the location of population 2

Example 2 • Given the following statistics, calculate the value of the test statistic to determine whether the population locations differ. In addition, calculate the P-value. • T1=250 n1=15 • T2=215 n2=15

Solution 2 • H0: The two population locations are the same H1: The location of population 1 is different from the location of population 2 p-value = 2P(Z > .73) = 2(.5 – .2673) = .4654.

Required conditions for nonparametric tests • A rejection of the null hypothesis when performing a nonparametric test can occur due to: • different location • different spread (variance) • different shape (distribution). • Since we are interested in the location, we require that the two probability distributions are identical, except for location.

Sign Test and Wilcoxon Signed Rank Sum Test • Two techniques for matched pairs experiment are introduced. • the objective is to compare two populations. • the data are either ordinal or interval (but not normal). • The samples are matched by pairs.

The Sign Test • This test is employed when: • The problem objective is to compare two populations, and • The data areordinal, and • The experimental design is matched pairs. • The hypotheses H0: The two population locations are the same H1: The two population locations differ or population 1 is right (left) of population 2

The Sign Test –Statistic and Sampling Distribution • A matched pair experiment calls for a test of matched pair differences. • The test statistic and sampling distribution • Recordthe sign of all the matched-pair-differences. • The number of positive (or negative) differences is the test statistic.

The Sign Test - Rationale • The number of positive or negative differences is binomial, with: • n = the number of non-zero differences • p = the probability that a difference is positive (negative) • If the two populations have the same locations (H0 is true), it is expected that Number of positive differences = Number of negative differences Thus, under H0: p = 0.5

The Sign Test - Rationale • The test statistic and sampling distribution • The hypotheses: H0: The two population locations are the same H1:The two population locations are different H0: p = .5 H1: p ¹ .5

The Sign Test –Statistic and Sampling Distribution • The Test – continued • The hypotheses tested H0: p = .5 H1: p ¹ .5 The binomial variable can be approximated by a normal variable if np and n(1-p) >= 5. The Z- statistic becomes

Example 3 • Suppose that in a matched pairs experiment, we find 28 positive differences, 7 zero differences, and 41 negative differences. Can we infer at the 10% significance level that the location of population 1 is to the left of the location 2?

Solution 3 • H0: The two population locations are the same H1: The location of population 1 is to the left of the location of population 2

Example 4 • 兩位評審員對12個參賽選美的候選人進行評分，其評分係依主觀偏好給予0-10分；而評分結果如下(α=5%) • 評審員I 5 6 10 7 0 9 7 10 9 6 9 9 • 評審員II 4 1 7 5 8 5 5 6 8 10 5 4 • 試以sign test 來檢定兩位評審員是否有顯著差異

Solution 4 • H0: 兩位評審員評分沒有差異(p=0.5) H1: 兩位評審員評分有差異(p≠0.5) • 評審員I 5 6 10 7 0 9 7 10 9 6 9 9 • 評審員II 4 1 7 5 8 5 5 6 8 10 5 4 + + + + - + + + + - + + • 10個+,2個-,n=12 • 拒絕域:z>1.96 or z<-1.96 • 拒絕H0,所以兩位評審員評分有差異

Wilcoxon Signed Rank Sum Test • This test is used when • the problem objective is to compare two populations, • the data are interval but not normal, • the samples are matched pairs. • The test statistic and sampling distribution • T is based on rank sum of the absolute values of the positive and negative differences • When n <=30, reject H0 if T>TU or T<TL(TL and TU tabulated values related to n, n is the number of nonzero ). • When n > 30, T is approximately normally distributed. Use a Z-test. • E(T) = n(n+1)/4 standard deviation = [n(n+1)(2n+1)/24]^(1/2)

Example 5 • Perform the Wilcoxon signed rank sum test to determine whether the location of population 1 differs from the location of population 2 given the data shown here. (Use =.05)

Solution 5 • H0: The two population locations are the same H1: The location of population 1 is different from the location of population 2

Example 6 • A matched pairs experiment produced the following statistics. Conduct a Wilcoxon signed rank sum test to determine whether the location of population 1 is to the right of the location of population 2.(Useα=.01)

Solution 6 • H0: The two population locations are the same H1: The location of population 1 is to the right of the location of population 2

Wilcoxon Rank Sum Test: Comparing Population Locations

Wilcoxon Rank Sum Test: Comparing Population Locations

Presentation Transcript