460 likes | 728 Vues
Nonparametric tests I. Back to basics. Lecture Outline. What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to use. MTB > dotplot 'Male' 'Female'; SUBC> same. . : . . . .
E N D
Nonparametric tests I Back to basics
Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use
MTB > dotplot 'Male' 'Female'; SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . . ---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . . ---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12
MTB > dotplot 'Male' 'Female'; SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . . ---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . . ---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12 MTB > desc 'Male' 'Female’ Variable N Mean Median TrMean StDev SEMean MALE 50 0.5908 0.5600 0.5770 0.1979 0.0280 FEMALE 50 0.5180 0.4950 0.5102 0.1315 0.0186 Variable Min Max Q1 Q3 MALE 0.2900 1.1300 0.4275 0.7150 FEMALE 0.3200 0.8500 0.4100 0.6125
Lecture Outline • What is a nonparametric test? • What is a parameter? • What are examples of non-parametric tests? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use
Parameters • are central to inference in GLM and ANOVA • and represent assumptions about the underlying processes
LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Group 1 1 2 2 3 -1-2 Fitted value = m + Error has Normal Distribution with zero mean and standard deviation
LET K1=4.7 # Group 1 mean minus grand mean LET K2=-2.5 # Group 2 mean minus grand mean LET K3=10.4 # The grand mean LET K4=1.9 # Standard deviation of the error RANDOM 30 'Error' LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error' Group 1 1 2 2 3 -1-2 Fitted value = m + Error has Normal Distribution with zero mean and standard deviation
Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes
Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations
Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations – BUT HOW?
Rnk Wt Sex 51 0.52 1 1 0.29 1 76 0.65 1 26 0.41 2 52 0.52 2 2 0.32 2 77 0.66 1 27 0.42 1 3 0.34 1 53 0.52 2 78 0.67 1 28 0.43 1 4 0.34 2 79 0.67 2 29 0.43 2 54 0.53 2 5 0.34 2 80 0.67 2 30 0.43 2 55 0.53 2 6 0.36 1 81 0.67 2 31 0.45 1 56 0.55 2 7 0.36 1 82 0.68 1 32 0.45 2 57 0.56 1 8 0.37 1 83 0.71 1 33 0.45 2 58 0.56 1 9 0.37 1 84 0.72 2 34 0.45 2 59 0.56 1 10 0.37 1 85 0.73 1 35 0.46 2 60 0.57 1 11 0.37 2 86 0.75 1 36 0.47 1 61 0.58 2 12 0.37 2 87 0.75 1 37 0.47 1 62 0.58 2 13 0.38 1 88 0.77 1 38 0.48 1 63 0.59 1 14 0.38 1 89 0.78 1 39 0.48 1 64 0.59 2 15 0.38 2 90 0.78 2 40 0.48 2 65 0.59 2 16 0.38 2 91 0.78 2 41 0.48 2 66 0.60 1 17 0.39 2 92 0.82 2 42 0.49 2 67 0.61 1 18 0.40 2 93 0.83 1 43 0.49 2 68 0.61 2 19 0.40 2 94 0.85 1 44 0.50 1 69 0.62 1 20 0.40 2 95 0.85 2 45 0.50 1 70 0.62 1 21 0.41 1 96 0.88 1 46 0.50 1 71 0.62 2 22 0.41 1 97 0.98 1 47 0.50 2 72 0.62 2 23 0.41 2 98 0.98 1 48 0.50 2 73 0.62 2 24 0.41 2 99 1.05 1 49 0.51 1 74 0.63 1 25 0.41 2 100 1.13 1 50 0.51 2 75 0.63 2
Remember ties Rnk Wt Sex 51 0.52 1 1 0.29 1 76 0.65 1 26 0.41 2 52 0.52 2 2 0.32 2 77 0.66 1 27 0.42 1 3 0.34 1 53 0.52 2 78 0.67 1 28 0.43 1 4 0.34 2 79 0.67 2 29 0.43 2 54 0.53 2 5 0.34 2 80 0.67 2 30 0.43 2 55 0.53 2 6 0.36 1 81 0.67 2 31 0.45 1 56 0.55 2 7 0.36 1 82 0.68 1 32 0.45 2 57 0.56 1 8 0.37 1 83 0.71 1 33 0.45 2 58 0.56 1 9 0.37 1 84 0.72 2 34 0.45 2 59 0.56 1 10 0.37 1 85 0.73 1 35 0.46 2 60 0.57 1 11 0.37 2 86 0.75 1 36 0.47 1 61 0.58 2 12 0.37 2 87 0.75 1 37 0.47 1 62 0.58 2 13 0.38 1 88 0.77 1 38 0.48 1 63 0.59 1 14 0.38 1 89 0.78 1 39 0.48 1 64 0.59 2 15 0.38 2 90 0.78 2 40 0.48 2 65 0.59 2 16 0.38 2 91 0.78 2 41 0.48 2 66 0.60 1 17 0.39 2 92 0.82 2 42 0.49 2 67 0.61 1 18 0.40 2 93 0.83 1 43 0.49 2 68 0.61 2 19 0.40 2 94 0.85 1 44 0.50 1 69 0.62 1 20 0.40 2 95 0.85 2 45 0.50 1 70 0.62 1 21 0.41 1 96 0.88 1 46 0.50 1 71 0.62 2 22 0.41 1 97 0.98 1 47 0.50 2 72 0.62 2 23 0.41 2 98 0.98 1 48 0.50 2 73 0.62 2 24 0.41 2 99 1.05 1 49 0.51 1 74 0.63 1 25 0.41 2 100 1.13 1 50 0.51 2 75 0.63 2
140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Mean Rank
140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26
140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74
140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 Mean Rank The ‘Male’ mean rank = 55.26 The ‘Female’ mean rank = 45.74
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties)
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05
MTB > mann-whitney male female Mann-Whitney Test and CI: MALE, FEMALE MALE N = 50 Median = 0.5600 FEMALE N = 50 Median = 0.4950 Point estimate for ETA1-ETA2 is 0.0500 95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200) W = 2763.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016 The test is significant at 0.1014 (adjusted for ties) Cannot reject at alpha = 0.05 The null hypothesis is better expressed as “the distributions of male and female weights are the same”.
Parameters • are central to inference in GLM and ANOVA • but represent assumptions about the underlying processes • can be done without in some simple situations
Sign Test One-sample t-test Nonparametric vs Parametric
Sign Test Mann-Whitney Test One-sample t-test Two-sample t-test Nonparametric vs Parametric
Sign Test Mann-Whitney Test Spearman Rank Test One-sample t-test Two-sample t-test Correlation/Regression Nonparametric vs Parametric
Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA Nonparametric vs Parametric
Sign Test Mann-Whitney Test Spearman Rank Test Kruskal-Wallis Test Friedman Test One-sample t-test Two-sample t-test Correlation/Regression One-way ANOVA One-way blocked ANOVA Nonparametric vs Parametric
Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use
A rose by any other name.. • Non-parametric tests lack parameters • Rank tests start by ranking the data • Distribution-free tests don’t assume a Normal distribution (or any other) These are mainly but not completely overlapping sets of tests (and some are scale-invariant too).
Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use
Fewer assumptions but... • still some assumptions (including independence) • limited range of situations • no more than 2 x-variables • can’t mix continuous and categorical x-variables • provide p-values but estimation is dodgy • loss of efficiency if parametric assumptions are upheld • there is a grand scheme for parametric statistics (GLM) but a lot of separate strange names for nonparametrics
When is there a choice? • when there is a non-parametric test • fewer than two or three variables altogether • and prediction is not required
How to choose: • If the assumptions of parametric test are upheld, use it – on grounds of efficiency • If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical) • If assumptions not fixable, use nonparametric test
MTB > dotplot 'LogM' 'LogF'; SUBC> same. . . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00
MTB > dotplot 'LogM' 'LogF'; SUBC> same. . . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00 MTB > desc 'LogM' 'LogF' Variable N Mean Median TrMean StDev SEMean LogM 50 -0.5786 -0.5798 -0.5850 0.3248 0.0459 LogF 50 -0.6878 -0.7032 -0.6928 0.2453 0.0347 Variable Min Max Q1 Q3 LogM -1.2379 0.1222 -0.8499 -0.3355 LogF -1.1394 -0.1625 -0.8916 -0.4902
Lecture Outline • What is a nonparametric test? • Rank tests, distribution free tests and nonparametric tests • Which type of test to use
Last remarks • Nonparametric tests are an opportunity to revise the basic ideas of statistical inference • They are sometimes useful in biology • They are often used in biology • NEXT WEEK: more nonparametrics, including confidence intervals and randomisation tests. READ the handout