Download Presentation
## Non-parametric statistics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Today´s programme**• Non-parametric tests (examples) • Some repetition of key concepts (time permitting) • Free experiment status Exercise • Group tasks on non-parametric tests (worked examples of will be provided!) • Free experiment supervision/help**Updates**• Did you get the compendium? • Remember: For week 12, regression and correlation, 100+ pages in compendium: No need to read all of it – read the introductions to each chapter, get the feel for the first simple examples – multiple regression and –correlation is for future reference**Non-parametric tests**Two types of statistical test: Parametric tests: • Based on assumption that the data have • certain characteristics or "parameters": • Results are only valid if: • (a) the data are normally distributed; • (b) the data show homogeneityofvariance; • (c) the data are measurements on an interval or ratio scale. Group 1: M = 8.19 (SD = 1.33), Group 2: M = 11.46 (SD = 9.18)**Non-parametric tests**Nonparametric tests • Make no assumptions about the data's characteristics. • Use if any of the three properties below are true: • (a) the data are not normally distributed (e.g. skewed); • (b) the data show in-homogeneity of variance; • (c) the data are measurements on an ordinalscale (ranks). • Non-parametric tests are used when we do not have ratio/interval data, or when the assumptions of parametric tests are broken**Non-parametric tests**• Just like parametric tests, which non-parametric test to use depends on the experimental design (repeated measures or within groups), and the number of/level of Ivs • Non-parametric tests are minimally affected by outliers, because scores are converted to ranks**Non-parametric tests**Examples of parametric tests and their non-parametric equivalents: Parametric test:Non-parametric counterpart: • Pearson correlation Spearman's correlation • (No equivalent test) Chi-Square test • Independent-means t-test Mann-Whitney test • Dependent-means t-test Wilcoxon test • One-way Independent Measures Analysis of Variance (ANOVA) Kruskal-Wallis test • One-way Repeated-Measures ANOVA Friedman's test**Non-parametric tests**• Non-parametric tests make few assumptions about the distribution of the data being analyzed • They get around this by not using raw scores, but by ranking them: The lowest score get rank 1, the next lowest rank 2, etc. • Different from test to test how ranking is carried out, but same principle • The analysis is carried out on the ranks, not the raw data • Ranking data means we lose information – we do not know the distance between the ranks • This means that non-par tests are less powerful than par tests, • and that non-par tests are less likely to discover an effect in our data than par tests (increased chance of type II error)**Mann-Whitney Test**• This is the non-parametric equivalent of the independent t-test • Used when you havetwo conditions, each performed by a separate group of subjects. • Each subject produces one score. Tests whether there a statistically significant difference between the two groups.**Mann-Whitney Test**Example: Difference between men and dogs • We count the number of ”doglike” behaviors in a group of 20 men and 20 dogs over 24 hours • The result is a table with 2 groups and their number of doglike behaviors • We run a Kolmogorv-Smirnov test (Vodka test) to see if data are normally distributed. The test is significant though (p<.0.009), so we need a non-parametric test to analyze the data**Mann-Whitney Test**• The MN test looks for differences in the ranked positions of scores in the two groups (samples) • Example ...**Mann-Whitney Test**• Mann-Whitney test, step-by-step: • Does it make any difference to students' comprehension of statistics whether the lectures are in English or in Klingon? • Group 1: Statistics lectures in English. • Group 2: Statistics lectures in Serbo-Croat • DV: Lecturer intelligibility ratings by students (0 = "unintelligible", 100 = "highly intelligible"). • Ratings - So Mann-Whitney is appropriate.**Step 1:**Rank all the scores together, regardless of group.**Mann-Whitney Test**How to Rank scores: • (a) Lowest score gets rank of “1”; next lowest gets “2”; and so on. • (b) If two or more scores with the same value are “tied”. (i) Give each tied score the rank it would have had, had it been different from the other scores. (ii) Add the ranks for the tied scores, and divide by the number of tied scores. Each of the ties gets this average rank. (iii) The next score after the set of ties gets the rank it would have obtained, had there been no tied scores. • Example: raw score: 6 34 34 48 “original” rank: 1 23 4 “actual” rank: 1 2.5 2.5 4**Mann-Whitney Test**• Formula for Mann-Whitney Test statistic: U Nx (Nx + 1) U = N1 * N2 + ---------------- - Tx 2 • T1 and T2 = Sum of ranks for groups 1 and 2 • N1 and N2 = Number of subjects in groups 1 and 2 • Tx = largest of the two rank totals • Nx = Number of subjects in Tx-group**Mann-Whitney Test**Step 2: • Add up the ranks for group 1, to get T1. Here, T1 = 83. • Add up the ranks for group 2, to get T2. Here, T2 = 70. Step 3: • N1 is the number of subjects in group 1; N2 is the number of subjects in group 2. Here, N1 = 8 and N2 = 9. Step 4: • Call the larger of these two rank totals Tx. Here, Tx = 83. • Nx is the number of subjects in this group; here, Nx = 8.**Mann-Whitney Test**Step 5: Find U: Nx (Nx + 1) U = N1 * N2 + ---------------- - Tx 2 In our example: 8 * (8 + 1) U = 8 * 9 + ---------------- - 83 2 U = 72 + 36 - 83 = 25**Mann-Whitney Test**• If there are unequalnumbersofsubjects - as in the present case - calculate U for bothrank totals and then use the smaller U. • In the present example, for T1, U = 25, and for T2, U = 47. Therefore, use 25 as U. Step 6: • Look up the critical value of U, (in a table), taking into account N1 and N2. If our obtained U is smallerthan the critical value of U, we reject the null hypothesis and conclude that our two groups do differ significantly.**Here, the critical value of U for N1 = 8 and N2 = 9 is 15.**Our obtained U of 25 is larger than this, and so we conclude that there is no significant difference between our two groups. Conclusion: Ratings of lecturer intelligibility are unaffected by whether the lectures are given in English or in Serbo-Croat.**Mann-Whitney using SPSS - output:**SPSS gives us two boxes as the output: Sum of ranks The U statistic Significance value of the test Can halve this if One-way hypothesis**The Wilcoxon Signed-Rank Test**The Wilcoxon test: • Used when you have two conditions, both performed by the same subjects. • Each subject produces two scores, one for each condition. • Tests whether there a statistically significant difference between the two conditions.**The Wilcoxon Signed-Rank Test**Wilcoxon test, step-by-step: • Does background music affect the mood of factory workers? • Eight workers: Each tested twice. • Condition A: Background music. • Condition B: Silence. • DV: Worker's mood rating (0 = "extremely miserable", 100 = "euphoric"). • Ratings data, so use Wilcoxon test.**Step 1:**Find the difference between each pair of scores, keeping track of the sign (+ or -) of the difference - different from a Mann Whitney U test, where the data themselves are ranked! Step 2: Rank the differences, ignoring their sign. Lowest = 1. Tied scores dealt with as before. Ignore zero difference-scores.**The Wilcoxon Signed-Rank Test**Step 3: • Add together the positive-signed ranks. = 22. • Add together the negative-signed ranks. = 6. Step 4: • "W" is the smaller sum of ranks; W = 6. • N is the number of differences, omitting zero differences: N = 8 - 1 = 7. Step 5: • Use table of critical W-values to find the critical value of W, for your N. Your obtained W has to be smaller than this critical value, for it to be statistically significant.**The Wilcoxon Signed-Rank Test**• The critical value of W (for an N of 7) is 2. • Our obtained W of 6 is bigger than this. • Our two conditions are not significantly different. • Conclusion: Workers' mood appears to be unaffected by presence or absence of background music.**Wilcoxon using SPSS - output:**What negative ranks refer to: Silence less score than w. music What positive ranks refer to: Silence higher score than w. music Ties = no changes in score w./wo. music As for MN-test, z-score becomes more accurate with higher sample size Number of SD´s from mean Significance value**Non-parametric tests for comparing three or more groups or**conditions**Non-parametric tests II**Non-parametric tests for comparing three or more groups or conditions: Kruskal-Wallis test: • Similar to the Mann-Whitney test, except that it enables you to compare three or moregroups rather than just two. • Different subjects are used for each group. Friedman's Test (Friedman´s ANOVA): • Similar to the Wilcoxon test, except that you can use it with three or moreconditions (for one group). • Each subject does all of the experimental conditions.**Non-parametric tests II**• One IV, with multiple levels • Levels can differ: (a) qualitatively/categorically - • e.g. effects of managerial style (laissex-faire, authoritarian, egalitarian) on worker satisfaction. • effects of mood (happy, sad, neutral) on memory. • effects of location (Scotland, England or Wales) on happiness ratings. (b) quantitatively - • e.g. effects of age (20 vs 40 vs 60 year olds) on optimism ratings. • effects of study time (1, 5 or 10 minutes) before being tested on recall of faces. • effects of class size on 10 year-olds' literacy. • effects of temperature (60, 100 and 120 deg.) on mood.**Non-parametric tests II**Why have experiments with more than two levels of the IV? (1) Increases generality of the conclusions: • E.g. comparing young (20) and old (70) subjects tells you nothing about the behaviour of intermediate age-groups. (2) Economy: • Getting subjects is expensive - may as well get as much data as possible from them – i.e. use more levels of the IV (or more IVs) (3) Can look for trends: • What are the effects on performance of increasingly large doses of cannabis (e.g. 100mg, 200mg, 300mg)?**Kruskal-Wallis Test**Kruskal-Wallis test, step-by-step: • Does it make any difference to students’ comprehension of statistics whether the lectures are given in English, Serbo-Croat - or Cantonese? • (similar case to MN-test, just one more language, i.e. group of people) • Group A – 4 ppl: Lectures in English; • Group B – 4 ppl: Lectures in Serbo-Croat; • Group C – 4 ppl: Lectures in Cantonese. • DV: student rating of lecturer's intelligibility on 100-point scale ("0" = "incomprehensible"). • Ratings - so use a non-parametric test. 3 groups – so KW-test**Kruskal-Wallis Test**Step 1: • Rank the scores, ignoring which group they belong to. • Lowest score gets lowest rank. • Tied scores get the average of the ranks they would otherwise have obtained (note the difference from the Wilcoxon test!)**Kruskal-Wallis Test**Formula: N is the total number of subjects; Tc is the rank total for each group; nc is the number of subjects in each group; H is the test statistic**Kruskal-Wallis Test**Step 2: • Find "Tc", the total of the ranks for each group. • Tc1 (the total for the English group) is 20. • Tc2 (for the Serbo-Croat group) is 40.5. • Tc3 (for the Cantonese group) is 17.5.**Kruskal-Wallis Test**Step 3: Find H. N is the total number of subjects; Tc is the rank total for each group; nc is the number of subjects in each group.**(**) Kruskal-Wallis Test**Kruskal-Wallis Test**• Step 4: In KW-test, we use degrees of freedom: • Degrees of freedom are the number of groups minus one. d.f. = 3 - 1 = 2. • Step 5: • H is statistically significant if it is larger than the critical value of Chi-Square for this many d.f. [Chi-Square is a test statistic distribution we use] • Here, H is 6.12. This is larger than 5.99, the critical value of Chi-Square for 2 d.f. (SPSS gives us this, no need to look in a table, but we could do it) • So: The three groups differ significantly: The language in which statistics is taught does make a difference to the lecturer's intelligibility. • NB: the test merely tells you that the three groups differ; inspect group medians to decide how they differ.**Using SPSS for the Kruskal-Wallis test:**"1" for "English", "2" for "Serbo-Croat", "3" for "Cantonese". Independent measures-test type: One column gives scores, another column identifies which group each score belongs to. Scores column Group column**Using SPSS for the Kruskal-Wallis test:**Analyze > Nonparametric tests > k independent samples**Using SPSS for the Kruskal-Wallis test :**Choose variable Identify groups**Output from SPSS for Kruskal-Wallis test**Mean rank values Test statistic (H) DF Significance**Kruskal-Wallis Test**• How do we find out how the four groups differed? • One way is to construct a box-whisker plot – and look at median values • What we really need is some contrasts and post-hoc tests like for ANOVA