Lecture9 non- parametric methods

Lecture9 nonparametric methods Xiaojin YU Department of Epi. And Biostatistics, School of public health，Southeast university

Review: Type of data • qualitative data (categorical data) (1) binary (dichotomous, binomial) (2) multinomial (polytomous) (3) ordinal • quantitative data

Measures of central tendency- quantitative data • Mean: Normal distribution • Geometric mean: positively skew and data can be transferred into normal distribution by log scale. • Median: used by all data, in general, is often used to abnormal data.

Measures of Dispersion- quantitative data • Range, • Interquartile range , • Variance and standard deviation , • coefficient of variation

Compare means by t-test Assumption: normality Equality of variance t ν type conditions H0 Single sample t-test ,S,n,μ0 n-1 μ=μ0 Paired t-test ,Sd,n μd=0 np-1 Two group t-test , s1,n1, , s2,n2 n1+n2-2

Comparison of Means between two groups

Are the 2 population proportions equal or not? How categorical variables are distributed among 2 population? Compare proportion by Chi-square test

solution • H0： πA = πB • H1: πA≠πB,α=0.05 • Calculate T and Test Statistic, Chi-square

Test Statistic

Conclusion Since 6.573>3.84, P<0.05, we reject H0,accept H1 at 0.05 level, so We conclude that the two populations are not homogeneous with respected to effect of drug. The effects of drug A and drug B are not equivalent.

Compare Ordinal data

OUTLINE • Basic logic of rank based methods • Rank sum test for 2 independent group (Completely random design) • Sign rank test for Paired design • Rank sum test for 3 or more independent group (Completely random design) • Multiple Comparison

Rank & Rank Sum • Review of median • Example of duration in Hospital: • month 3.1 5.5 6.0 10.2 11.9 • rank 1 2 3 4 5

Task to you • How to compare boys are taller or girl are taller no measuring is allowed?

Solution to the task Blue-male Red- female 16

The locations of small value are in front(small rank), and great value are in the post(greatrank).

Part I: Wilcoxon Rank Sum Test • Rank Sum Test for Comparing the Locations of Two Populations • Mann-Whitney test • review t-test for comparing 2 population means Normality and homogeneity

Rank: 1 2 3.5 3.5 5 6

Cats rabbits minutes rank minutes rank 25 9.5 14 1 34 13 15 2 44 15 16 3 46 16 17 4 46 17 19 5 48 18 21 6.5 49 19 21 6.5 50 20 23 8 25 9.5 28 11 30 12 35 14 n1=8 T1=127.5 n2=12 T2=82.5 EXAMPLE 1: Table 9.1 Survival Times of Cats & Rabbits without oxygen

STEP I: Test Hypothesis and sig. level • H0：M1=M2 population locations of survival time of both cat and rabbit are equal • H1： M1 ≠ M2population locations of survival time of both cat and rabbit are not equal ； • a= 0.05

Pooled sample time rank time rank 28 11 14 1 30 12 15 2 34 13 16 3 35 14 17 4 44 15 19 5 46 16 21 6.5 46 17 21 6.5 48 18 23 8 49 19 25 9.5 50 20 25 9.5 n1=8 T1=127.5 n2=12 T2=82.5 STEP II: StatisticAssign Ranks • To pool n1 +n2 observations to form a single sample • rank all observations of the pooled sample from smallest to largest in column 2 and 4 • Mid-ranks are used by tied values

STEP II: StatisticTest statistic T • Calculate the rank sums for the two samples respectively, denotes by T1 and T2. • Take the Ti with small n as T. • n1=8<n2=12, so T= T1 =127.5. • Sum(T1 ,T2)=N(N+1)/2=210

STEP III: Determine P Value, conclusion • From table in appendix E， by n1=8,n2-n1=4， we have the critical interval of Tα (58-110) • Since T=127.5, is beyond of Tα, so,P≤α。Given α=0.05, P<0.05; • H0 is rejected, it concludes that the survival times of cats and rabbits in the environment without oxygen might be different. • Cat will survive for longer time without oxygen.

BASIC LOGIC • N=N1+N2 GIVEN N, the total rank sum is fixed and can be calculated . If H0 is true, the total rank sum should be assigned between 2 groups with weight of ni.

Normal Approximation n1>10 or n2-n1 >10 Correction of ties

effect DrugA Drug B total Range of rank Average rank Rank sum DrugA DrugB 0 ineffect. 17 70 87 1-87 44 748 3080 1 effect. 25 13 38 88-125 106.5 2662.5 1384.5 2(healed) 27 37 64 126-189 157.5 4252.5 5827.5 total 69 120 189 ~ ~ 7663 10292 TA=7663 n=69; TB=10293,n=120 EXAMPLE 2: Table 9-2 Results From a Clinic Trial for Hypertension

Part II: Wilcoxon’s Signed Rank Test • Wilcoxon(1945) H0: Md=0 • Example: • A test procedure the data on 28 patients data(14 pairs) from a sequential analysis double blind clinical trial for cancer of the head and neck will be used. (Bakowski MT, etc. Int. J. Radiation Oncology Biology Physics 1978 ,4 :115-119)

Wilcoxon’s Signed Rank Test often used 1) quantitative data---t-test for pairs design the difference of pairs must be normal, if its distribution is skew then must used Signed Rank Test. • Qualitative data--- pairs design ordinal

Example 9.3 • 2 treatment groups : radiotherapy + drug (B) • radiotherapy + placebo (A) • The tumor response within three months of completion of treatment was assessed for each patient in terms of complete regression (CR), partial regression (PR), no change (NC) and progression of the disease (P). • Scored from 1 to 5 as follows: • 5 = CR with no recurrence subsequently up to 6 months ore, 4 = CR initially but with a subsequent recurrence within 6 months, • 3 = PR, • 2 = NC, • 1 = P.

STEP I: Test Hypothesis • H0： Md=0 population Median of differences is equal to zero; • H1： Md≠0 population Median of differences is equal to zero; • α=0.05

STEP II: statisticAssigning Rank • 1) Calculate the difference di=xi-yi, and ignore all the pairs with zero differences. • 2) Rank the absolute values of non-zero dis from the smallest to the largest such that each di gets a rank; if there is a tie, what will we do?

Ties: These six patients all have differences of 1 and therefore the rank numbers 1, 2, 3, 4, 5 and 6 must be divided amongst them. That is, they all have a rank of (1 + 2 + 3 + 4 + 5 + 6)/6 =3.5 • 3) Assign the initial signs of dis to their ranks

Test Statistics T • valid number of pairs n=10; • Find the sum of the ranks with positive signs and denote by T+; • Find the sum of the ranks with negative signs and denote by T-; • Sum(T+ ,T-)=n(n+1)/2=55 • Let T=min(T+ ,T-) or anyone。 • T-=48,T+=7

Step 3) Determine the P value & Conclude A Conclusion • n<25，find the critical value range Tα in table 10.3 (P184) . • n=10，T=48 or 7，in this example, given the value of α=0.05, find the critical value T0.05 is (8~47)，T is not in the interval，P<0.05, H0 is not rejected。It can not conclude that the results from two different between 2 treatments.

Normal Approximation • When n> 25，the table 10.3 can’t help. Then we turn to the normal approximation. • In fact it can be proved that if H0 is true, when n is large enough, the distribution of statistic T will close to a normal distribution with

Correction of Continuity • If there is tie, the statistic is

Part III: Kruskal-Wallis Test • Similar to one-way ANOVA /chi-square test • Used to test location of more than 2 populations

Example 9.4 Allocate 24 person randomly to 1 of 3 groups: no exercise; 20 minutes of jogging per day; or 60 minutes of jogging per day. At the end of a month, ask each participant to rate how depressed they now feel, on a Likert scale that runs from 1 ("totally miserable") through to 100 (ecstatically happy"). Question:Does physical exercise alleviate depression?

Ri Report on depression from 3 groups and ranks

Test Hypothesis • H0: M1=M2=…=Mk 3 populations have the same population location • H1: M1,M2,…Mk are not all equal : 3 populations have different population location , At least one of the populations has a median different from the others. • a= 0.05

Test Statistic -H • Let N=n1+n2+n3 • Ri the sum of the ranks associated with the ith sample, like 76.5,79.5,144 • The average rank is (N+1)/2 • The sample average rank for ith sample is Ri/ni • 12/{N(N+1)}standard the test statistic in terms of the overall sample size N.

Solution to Example • K=3 R1=76.5 n1=8 R2=79.5 n2=8 R3=144 n3=8 • There are k-1=2 degree of freedom in this example.

Adjusted Formulae for Tied • the number of individuals within the j-th tied subgroup

CRITICAL VALUE • Table 11 H-critical values • C2 –Critical Values • when n is big enough， H is distributed as c2 distribution approximately withn = k – 1

Conclusion • k=3，，the critical value is 5.99 . • Since 7.27>5.99,the P<0.05. we reject H0. that is, there is evidence that at least one of the groups is different from others.

NONPARAMETRIC test Nonparametric: That are not focused on testing hypothesis about the parameters of the population. Distribution-free: make no assumptions about the distribution of the data; and are suitable for small sample sizes or large samples where parametric assumptions are violated – Use ranks of the data values rather than actual data values themselves – Loss of power when parametric test is appropriate 48

Parametric and non-parametric equivalents

NONPARAMETRIC test Advantages More different types of data Numerical Data with unknown distribution or skewed distribution Ordinal variable or the measurement data that are given with rank only Disadvantage A waste of data Loss of power when parametric test is appropriate 50

Lecture9 non- parametric methods