Review of yesterday [ ] perm book www.ekolog.se/mstat
Reporting statistics • Birch twigs with flowers were • shorter than birch twigs without • flowers (t=2.7, p=0.014, N=30). • Use two significance digits • Or use p<0.0001
Copy-Paste Statistics Open your data set in R: library(xlsReadWrite) yourdata<-read.xls(file.choose()) attach(yourdata) names(yourdata)
Check for Errors names(yourdata) fix(yourdata) levels(your.x) is.numeric(your.y)
Check for Errors levels(your.x) ”with ” ”Without”
Addresses • x<-c(1,5,2,4) • x • 5 3 4 • x • 5
Area under the curve 95% No. random samples 2,5% 2,5% Difference
Risk of by chance only < 5 % 95% No. random samples 2,5% 2,5% Difference
Risk by chance = 1,4 +1,4 = 2,8 % 95% No. random samples 1,4% 1,4% Difference
p-value • 2,8 % is the probability (p) that there is NO real difference. • p = 0,028 means that there is a 2,8% chance that the groups do not really differ, but that we by chance get the data points that we collected. • 2,8 % probability is ridiculously small! We don’t believe in that! • If it’s is not just due to chance, it must depend on something else… • … e.g., water quality. The habitats differ significantly. • p < 0,05 counts as ridiculously small
p-value is NOT! • p = 0.23 does NOT mean that there is NO difference. • There might be a difference, but we are not confident enough to state that. • The risk that the result is due to chance alone is too high. • p = 0.052 is very very close to ”ridiculously small” so there might indeed be a difference.
Intervals • Not really confidence intervals… • A confidence interval is around the estimated value. • There is a 95 % chance that the true mean or the true slope is within the confidence interval.
Intervals • We have calculated permutation critical values. • This interval includes the 95 % most likely values that you can get by chance alone (if there is no relationship or no difference). • If the actual value from a study is outside this critical interval, it means that the result is very unlikely to be an effect of chance alone. • Then the p-value is less than 5 % and the result is significant.
Critical values No. random samples Difference
Today: Categorical response variables
Categorical response variables • Logistic regression • 2×2-test
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova t-test Continuous - - Seed size Continuous Categoric Explanatory variable
Logistic regression • Binary response = Either… or…(i.e., categorical with 2 outcomes) • Continuous explanatory
Logistic regression • Examples: • Does seed size (x) affect germination (y)? • Does body fat reserves (x) affect survival (y) in hedge hogs? • Does flower size (x) affect pollination (y) or seed predation (y)?
Logistic regression • Easy to test in R! • But the test ”under the hood” is complicated. • Hard to make a neat graph. • stripchart(x~y) • Hard to give an intuitive effect.(cf. regression slope)
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
2×2 = proportions • Binary respons = Either… or…(i.e., categorical with 2 alternatives) • Binary explantory = Either… or…(i.e., categorical with 2 alternatives) • Contingency tables • Fischer’s exact test easy in R(super easy in R commander!) • Sometimes Chi-2 have been used • BUT, it does not give an exact p-value
2×2 = proportion test • Examples: • Do men and women have different preferenses? • Does different ants prefer different seeds? • …
Lunches for students and lecturers lunchchoice ~ students.teachers
More than two groups • Avoid! • What is really the explanatory variable (x)? • Could you do a logistic regression instead? • But it is possible Generalised linear model, something like a ”logistisk anova” (but it’s usually called a 3×2 contingency table)
Break? • No?
Before your study • What is your hypothesis? What would the result look like if your idea is correct? • What is your null hypothesis? • Draw graphs for Logistic regression!! • (Fake data – test to test)
Ant dispersal • Elaiosome
Do ants prefer seeds with large elaiosomes? Response: Elaiosome size (categorical) No explanatory
Ant preferences • Green pearl = Small elaiosome (ant candy) • Black pearl = Large elaiosome (ant candy) • Blue bag = All ants that collect seeds • Do ants select for larger elaiosomes? • Experiment: 30 ants (=30 pearls)
Ant preferences • Blue bag = All ants that collect seeds • ”By chance”-bag = equal numbers of • red and white pearls • Let’s pretend:White pearl = Large; Red pearl = Small • what experient results can you get by chance?
Is this because of chance? • Is there a risk of more than 5 % that the pattern in the experiment is due to chance? • No! Aha! Significant!Then it must be explained by something else (than chance). • E.g., that there is not 50 % ants with small elaiosomes and 50 % with large. • And that ants prefer large elaiosomes. • Well, yes... Well, then our experiment result may be explained by chance effects (random sampling). And perhaps ants don’t care.
What is the chance (in percent) to get as clear pattern as you got in your experiment, only by chance? = p-value