Review of yesterday • Nice biological thinking! • Don’t FOCUS too much on error sources! • Your study may be correct in finding no difference or no relationship… • Use introduction for hypothesis about biology, not theory of statistics. • English. Do spell check. Help each other. • web resources?
Review of yesterday • Do read the basic manuals • Open and Save in R • Alternative 1: xls.ReadWrite read.xls() • Alternative 2: copy read.delim() • Save graphs • Ctrl + C (in graph) Ctrl + V (in word) bitmap • Ctrl + W (in graph) Ctrl + V (in word) • windows meta file
Continuous variables • If possible Import as Excel with xlsReadWrite • If not be ware of 1,5 or 1.5 • comma use read.delim2(”clipboard”) • period use read.delim(”clipboard”) • Also: Which is your response and which is your explanatory? • Avoid Length..in.mm. Write length
Copy paste statistics Only change green fat code! plot(y~x,xlab=“Seed size”)
R tricks Ctrl + X copy AND paste arrow up last line history(Inf) all commands without> plot(y~x)
word pricks col="red" goes col=“red” Change this Computer tricks manual Or try Notepad++
16 14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
16 14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable
R tricks plot(y~x,cex=2,cex.lab=1.5,cex.axis=2) Change the size of the graph window. Want an extra graph window? x11()
Birch– reproductive cost Change the size of the graph window in R, not word.
Outliers • Causes: • Typingerrors. • Data points affected by unwanted stuffs. • Biologically relevant data points • Butperhaps given unproportionallylargeeffect on result… • Tell the reader if you have removed any!
Lichens Standardised study design. Only lichens between 0.5 and 1.5 meters? Only on trunk? Only…
Can you really trust your studies? • Risk of chance effects. • By chance you maybe happened to get those special individuals…
Permutations • DecoupleResponse AND explanation: • Take all x-values. • Put them in a box and shake. • Pour them back in the x-column • Now there should be no relationship or no difference. Right? • But how large differences can you get by chance (with a 95 % probability)?
Lego shrimps I • Does shrimp size depend on water quality? • Red piece = shrimp size(y, response) • Blue or green piece = clean or polluted water (x, explanatory)
Lego shrimp I • Does shrimp size depend on water quality? • Red piece = shrimp size(y, response) • Blue or green piece = clean or polluted water (x, explanatory) • If we shuffle the x variable (red or green pieces) what difference may we get by chance? How large?
Area under the curve 95% No. random samples 2,5% 2,5% Difference
Risk of by chance only < 5 % 95% No. random samples 2,5% 2,5% Difference
Risk by chance = 1,4 +1,4 = 2,8 % 95% No. random samples 1,4% 1,4% Difference
p-value • 2,8 % is the probability (p) that there is NO real difference. • p = 0,028 means that there is a 2,8% chance that the groups do not really differ, but that we by chance get the data points that we collected. • 2,8 % probability is ridiculously small! We don’t believe in that! • If it’s is not just due to chance, it must depend on something else… • … e.g., water quality. The habitats differ significantly. • p < 0,05 counts as ridiculously small
What will affect the p-value? • The difference between groups • (… between their means) • The variation within groups • The sample size • (unreliability of group means) • Variation • Sample size
t-test in R • t.test(y~x,var.equal=T)
Under the hood • Competent Drivers vs. Mechanics
Difference between means • Female mean = 9 • Male mean = 13 • Difference = 13-9=4Soft! • But the unreliability?
Measure of variation? • Variation ≈ red lines! • Mean red line length? • Nope! • absolute values hard • Instead: • ≈ Mean squared red lines!
Variance • ≈ Mean squared red lines!
Degrees of freedom • For a group variance the df = n-1 • Why? – • To calculate a variance the mean is required! (y-mean(y))2 • But given a mean, only n-1 data point variations (y-mean(y)) can freely change and be used to estimate the variance. • If we independently change n-1 deviations, the last one can't be independent. • It must sum up with the rest to zero. • It's "freedom" is locked, used, by the mean!
Variance & Standard deviation • Standard deviation = SD • Sometimes used to show variability in graphs • ±1 SD = 68% of data points • ±1.96 SD = 95% of data points var(y) = 3.3 sd(y) = 1.8
How much is 3.1? 1% 1% 2.5% 2.5%
How much is 3.1? 1% 1% 2.5% 2.5%