Statistics for Linguistics Students

Statistics for Linguistics Students Michaelmas 2004 Week 5 Bettina Braun www.phon.ox.ac.uk/~bettina

Overview • P-values • How can we tell that data are taken from a normal distribution? • Speaker normalisation • Data aggregation • Practicals • Non-parametric tests

p-values • p-values for all tests tell us whether or not to reject the null hypothesis (and with what confidence) • In linguistic research, a confidence level of 95% is often sufficient, some use 99% • This decision is up to you. Note that the more stringent your confidence level, the more likely is a type II error (you don’t find a difference that is actually there)

p-values • If you decide for a p-value of 0.05 (95% certainty that there indeed is a significant difference), then a value smaller than 0.05 indicates that you can reject the null-hypothesis • Remember: the null-hypothesis generally predicts that there is no difference • If we find an output saying p = 0.000, we cannot certainly say that it is not 0.00049; so we generally say p < 0.001

p-values • So, in a t-test, if you have p = 0.07 means that you cannot reject the null hypothesis that there is no difference there is no significant difference between the two groups • In the Levene test for homogenity of variances, if p = 0.001, then you have to reject the null-hypothesis that there is no difference so there is a difference in the variances for the two groups

Kolmogorov-Smirnov test • Parametric tests assume that the data are taken from normal distributions • Kolmogorov-Smirnov test can be used to compare actual data to normal distribution -- the cumulative probabilities of values in the data are compared with the cumulative probabilities in a theoretical normal distribution • Null-hypothesis: your sample is taken from a normal distribution

Kolmogorov-Smirnov test • Non-parametric test • Kolmogorov-Smirnoff statistic is the greatest difference in cumulative probabilities across range of values • If its value exceeds a threshold, null-hypothesis is to be rejected

Kolmogorov-Smirnov test • Kolmogorov test is not significant, i.e. the null-hypothesis that our sample is drawn from a normal distribution holds • The distribution can therefore be assumed to be normal: Kolmogorov-Smirnov Z = 0.59; p = 0.9

Speaker normalisation • We often collect data from different subjects but we are not interested in the speaker differences (e.g. mean pitch height, average speaking rate) • We can convert the data to z-scores (which tell us how many sd away a given score is from the speaker mean)

Speaker normalisation in SPSS • First, you have the split the file according to the speakers (Data -> split file)

Speaker normalisation in SPSS • Then, Analyze -> Descriptive Statistics -> Descriptives • This will create an output, but also a new column with z-values

Sorting data for within-subjects desings

Aggregating data • One can easily build a mean for different categories, preserving the structure of the SPSS table • Data -> Aggregate • Independent variables you want to preserve are “break variables” • Dependent variables for which you’d like to calculate the mean are “Aggregated variables” • Per default, new table will be stored as aggr.sav

Aggregating data • SPSS-dialogue-box

Non-parametric tests • If assumptions for parametric tests are not met, you have to do non-parametric tests. • They are statistically less powerful (i.e. they are more likely not to find a difference that is actually there – Type I error) • On the other hand, if a non-parametric test shows a significant difference, you can draw strong conclusions

Mann-Whitney test • Non-parametric equivalent to independent t-test • Null-hypothesis: The two samples we are comparing are from the same distribution • All data are ranked and calculations are done on the ranks

Wilcoxon Signed ranks test • Non-parametric equivalent to paired t-test • The absolute differences in the two conditions are ranked • Then the sign is added and the sum of the negative and positive ranks is compared • Requires that the two samples are drawn from populations with the same distribution shape (if this is not the case, use the Sign Test)

Examples • English is closer to German than French is • A teacher compares the marks of a group of German students who take English and French (according to the German system from 1 to 15) • His research hypothesis is that pupils have better marks in English than in French • One-tailed prediction! • File: language_marks.sav

Example • For a one-tailed test divide the significance value bz 2 • Marks in English are better than in French (Z= -2.28, p = 0.011)

What are frequency data? • Number of subjects/events in a given category • You can then test whether the observed frequencies deviate from your expected frequencies • E.g. In an election, there is an a priori change of 50-50 for each candidate. • Note that you must determine your expected frequencies beforehand

X2-test • Null-hypothesis: there is no difference between expected and observed frequency • Data • Calculation

X2-test example • Null-hypothesis: there is no difference between expected and observed frequency • Data • Calculation

Looking up the p-value • Calculated value for X2 must be larger than the one found in the table • Degrees of freedom: • If there is one independent variabledf = (a – 1) • Iif there are two independent variables:df = (a-1)(b-1)

X2-test • Limitations: • All raw data for X2 must be frequencies (not percentages!) • Each subject or event is counted only once(if we wish to find out whether boys or girls are more likely to pass or fail a test, we might observe the performance of 100 children on a test. We may not observe the performance of 25 children on 4 tests, however) • The total number of observations should be greater than 20 • The expected frequency in any cell should be greater than 5

Row total x column total Grand total X2 as test of association • Calculation of expected frequencies: Cell freq =

Statistics for Linguistics Students