Nonparametric Methods

Nonparametric Methods Prepared by Yu-Fen Li

Parametric tests • Earlier, the populations from which the data were sampled were assumed to be either normally distributed or approximately so • this property is necessary for the tests to be valid,e.g. Z-test, t-test, and ANOVA • Since the forms of the underlying distributions are assumed to be known and only the values of certain parameters such as means and standard deviations are not, these tests are said to be parametric

Nonparametric tests • Data are not necessary to be normally distributed • Nonparametric techniques make fewer assumptions about the nature of the underlying distribution. • distribution-free methods • We will review three different nonparametric methods • two for one-sample or paired data (sign test & Wilcoxon signed rank test) • one for non-paired two-sample data (Wilcoxon rank sum test)

The Sign Test • The sign test may be used to compare two samples of observations when the populations from which the values are drawn are not independent. • It is similar to the paired t-test • Like the t-test, it does not examine the two groups individually; instead, it focuses on the difference in values for each pair. • However, it does not require that the population of differences be normally distributed.

The Sign Test • The null hypothesis for paired data : in the underlying population of differences among pairs, the median difference is equal to 0 • Ho: mediandiff=0 vs Ha: mediandiff0 • For the one-sample scenario, the null hypothesis is that the population median is equal to a constant, Mo (e.g., 170) • Ho: median=170 vs Ha: median170

The Sign Test • Steps for paired data: • selecting a random sample of pairs of observations • calculate the difference for each pair of observations. • If the difference is greater than 0, the pair is assigned a plus sign; if it is less than 0, it receives a minus sign. • count the number of plus signs in the sample; this total is denoted by D Note: any pairs with a zero difference would be excluded from the analysis, so the sample size needs to be adjusted accordingly

The Sign Test • Steps for one-sample data: • selecting a random sample of pairs of observations • If the value is greater than Mo, the observation is assigned a plus sign; if it is less than Mo, it receives a minus sign. • count the number of plus signs in the sample; this total is denoted by D Note: any value equal to Mo would be excluded from the analysis, so the sample size needs to be adjusted accordingly

Bernoulli and Binomial distributions • Under the null hypothesis, we would expect to have approximately equal numbers of plus and minus signs. • Signs can be thought of as the outcomes of a Bernoulli r.v. with probability of success p = 0.5. • The total number of plus signs D is a binomial r.v. with parameters n and p • the mean number of plus signs is a sample of size n is np = n/2, and the standard deviation is

The Sign Test • If the null hypothesis is true and the value of n is sufficiently large, then • This test is called the sign test because it depends only on the signs of the calculated differences, not on the their actual magnitudes.

The Sign Test • If the sample size n is small, less than about 20, the test statistic z+ cannot always be assumed to have a standard normal distribution. • we can use the binomial distribution itself to calculate the probability of observing D positive signs or some number more extreme given that Ho is true

Example: the sign test • The measurements of resting energy expenditure (REE) for samples of 13 patients with cystic fibrosis and 13 healthy individuals matched to the patients on age, sex, height, and weight.

Example: the sign test • Test statistic (D=11), • p = 2(0.006) = 0.012; since p is less than 0.05, we reject the null hypothesis and conclude that the median difference among pairs is not equal to 0

Example: the sign test • As the sample size is not big enough, we can use the binomial distribution itself to calculate the probability of observing D positive signs or some number more extreme given that Ho is true. • Because of D ~ B(13; 0.5), • Since we are interested in the two-sided test, the corresponding p-value is approximately 2(0.0112)=0.0224. • Again, we reject the null hypothesis at the 0.05 level of significance

The Wilcoxon Signed-Rank Test • The sign test ignores the magnitude of differences, so it is not often used in practice. • Instead, the Wilcoxon signed-rank test can be used to compare two samples from population that are not independent • Like the sign test, it does not require the normality of differences. • However, it does take into account the magnitudes of differences as well as their signs.

Ho and Ha • The null hypothesis for paired data: in the underlying population of differences among pairs, the median difference is equal to 0 • Ho: mediandiff=0 vs Ha: mediandiff0 • For the one-sample scenario, the null hypothesis is that the population median is equal to a constant, Mo (e.g., 170) • Ho: median=170 vs Ha: median170

The Wilcoxon Signed-Rank Test To conduct the Wilcoxon signed-rank test, we process as follows: 1) selecting a random sample of n pairs of observations; 2) calculating the difference for each pair; 3) ranking the absolute values of differences from smallest to largest: a difference of 0 is not ranked and is eliminated from the analysis (i.e. the sample size is reduced by 1 for each pair eliminated), and the tied values are assigned an average rank;

The Wilcoxon Signed-Rank Test 4) assigning each rank either a plus or a minus sign depending on the sign of the difference; 5) computing the sum of the positive ranks and the sum of the negative ranks; 6) denoting the smaller sum by T ignoring the signs; 7) calculating the test statistic: (if Ho is true and n is large enough) where and

The Wilcoxon Signed-Rank Test • If n is small, the test statistic cannot be assumed to follow a standard normal distribution. • In this case, using Table A.6 in Appendix A to help us determine whether we should reject the null hypothesis • Table A.6 displays the distribution function of the smaller sum of ranks T for samples of size n 12 • Note that Table A.6 provides the p-value of the one-sided test of hypothesis.

Table A.6 in Appendix A • The possible values of T, represented by To, are listed down the left-hand side of the table; the sample sizes n are displayed across the top. • For each combination of To and n, the entry in the table is the probability that T  To. • If n = 8, for instance, the probability that T  5 is 0.0391. • The p-value of the appropriate two-sided test is approximately 2(0.0391)=0.0782

Example: The Wilcoxon Signed-Rank Test • Because the sum of the positive ranks is 86 and the sum of negative ranks is -19, T = 19 • where

Example: The Wilcoxon Signed-Rank Test • The area under the standard normal curve to the left of z = 2.10 and to the right of Z = 2.10 is p = 2(0.018) = 0.036. • Since p < 0.05, we reject the null hypothesis and conclude that the median difference is not equal to 0

The Wilcoxon Rank Sum Test • The Wilcoxon rank sum test is used to compare two samples that have been drawn from independent populations. • Consequently, it is a nonparametric counterpart of the two-sample t-test. • Unlike the t-test, it does not require that the underlying populations be normally distributed or that their variances be equal.

The Wilcoxon Rank Sum Test • Test whether two independent samples of observations are drawn from the same or identical distributions • When it does assume that the distributions have the same general shape, the Wilcoxon rank sum test evaluates the null hypothesis that the medians of the two populations are identical. • It is also known as Mann-Whitney U test

The Wilcoxon Rank Sum Test To conduct the Wilcoxon rank sum test, we process as follows: 1) selecting a random sample from each of the populations of interest; 2) combining the two samples into one large group; 3) ranking the observations from smallest to largest; the tied values are assigned an average rank; 4) computing the sum of the ranks corresponding to each of the original sample;

The Wilcoxon Rank Sum Test 5) denoting the smaller of the two sums by W; 6) calculating the test statistic: • where • In these equations, nS represents the number of observations in the sample that has the smaller sum of ranks and nLthe number of observations in the sample with the larger sum

Example (4+5)/2 = 4.5 • Consider the distribution of normalized mental age scores for two samples of children suffering from phenylketonuria (PKU) • Compare normalized mental age scores for two populations of children, average daily phenylalanine levels cut at 10 mg/dl • There are some observations with the same normalized mental age scores, so we have to assign them an average rank. (22+23+24)/3 =23 (38+39)/2 = 38.5 nL= 21 ns = 18

Example • The sum of ranks in the low exposure group is 467, and the sum in the high exposure group is 313; therefore, W = 313, nS = 18, and nL = 21. • In addition, • Since p = 2(0.093) = 0.186 is greater than 0.05, we do not reject the null hypothesis.

Advantages of Nonparametric Methods Nonparametric techniques have several advantages over traditional methods: 1) nonparametric methods do not incorporate all the restrictive assumptions, e.g. normality of the underlying populations; 2) nonparametric methods can be performed relatively quickly for small samples; 3) ranks are less sensitive to measurement error; 4) nonparametric methods work for ordinal data as parametric tests are usually not appropriate for ordinal data.

Disadvantages of Nonparametric Methods Nonparametric methods have a number of disadvantages: 1) nonparametric methods are less powerful than the comparable parametric technique if the assumptions underlying a parametric test are satisfied; 2) the hypotheses tested by nonparametric methods are less specific; 3) T and W are overestimated if a large proportion of the observations are tied.

Example: the distributions have the same general shape • The P value of the nonparametric tests answers this question: What is the chance that a randomly selected value from the population with the larger mean rank is greater than a randomly selected value from the other population(s)? • If the distributions of the populations have the same shape, then the nonparametric test can be considered as a test of medians.

Example The two sets of numbers have an identical mean (43.5) and an identical median (27.5), but their mean ranks are different with p=0.0429. Rank sum 269 397

Nonparametric Methods

Nonparametric Methods

Presentation Transcript

Nonparametric Methods II

PBHL 5313 Nonparametric Methods Module III

Nonparametric Techniques

Topic 15 Nonparametric/Distribution Free Methods

Nonparametric Methods: Nearest Neighbors

Chapter 19 Nonparametric Methods

Nonparametric Methods Featuring the Bootstrap

Nonparametric Statistics

Nonparametric Statistical Methods

Nonparametric Statistical Methods: Overview and Examples

Nonparametric Bayesian Methods for Genetic Inference

Nonparametric Thresholding Methods (FWE inference w/ SnPM)

Nonparametric Methods: Analysis of Ranked Data

Nonparametric Methods

Nonparametric Tests

Chapter 19 Nonparametric Methods

Consumer Behavior Prediction using Parametric and Nonparametric Methods

Nonparametric Thresholding Methods (FWE inference w/ SnPM)

Nonparametric Statistical Methods

Nonparametric Thresholding Methods (FWE inference w/ SnPM)

Nonparametric Statistical Methods: Overview and Examples