Download Presentation
## Review of yesterday

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Review of yesterday**• Nice biological thinking! • Don’t FOCUS too much on error sources! • Your study may be correct in finding no difference or no relationship… • Use introduction for hypothesis about biology, not theory of statistics. • English. Do spell check. Help each other. • web resources?**Review of yesterday**• Do read the basic manuals • Open and Save in R • Alternative 1: xls.ReadWrite read.xls() • Alternative 2: copy read.delim() • Save graphs • Ctrl + C (in graph) Ctrl + V (in word) bitmap • Ctrl + W (in graph) Ctrl + V (in word) • windows meta file**Continuous variables**• If possible Import as Excel with xlsReadWrite • If not be ware of 1,5 or 1.5 • comma use read.delim2(”clipboard”) • period use read.delim(”clipboard”) • Also: Which is your response and which is your explanatory? • Avoid Length..in.mm. Write length**Copy paste statistics**Only change green fat code! plot(y~x,xlab=“Seed size”)**R tricks**Ctrl + X copy AND paste arrow up last line history(Inf) all commands without> plot(y~x)**word pricks**col="red" goes col=“red” Change this Computer tricks manual Or try Notepad++**16**14 12 10 8 6 4 Red ants Black ants Logistic regression 2 2 tables Categoric 1.0 Melica 0.8 0.6 Prob. of choosing Melica 0.4 0.2 0.0 Response variable Luzula 4.5 5.5 6.5 7.5 Ant size Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable**16**14 12 10 8 6 4 Red ants Black ants Response variable Regression Anova Continuous - - Seed size Continuous Categoric Explanatory variable**R tricks**plot(y~x,cex=2,cex.lab=1.5,cex.axis=2) Change the size of the graph window. Want an extra graph window? x11()**Birch– reproductive cost**Change the size of the graph window in R, not word.**Outliers**• Causes: • Typingerrors. • Data points affected by unwanted stuffs. • Biologically relevant data points • Butperhaps given unproportionallylargeeffect on result… • Tell the reader if you have removed any!**Lichens**Standardised study design. Only lichens between 0.5 and 1.5 meters? Only on trunk? Only…**Can you really trust your studies?**• Risk of chance effects. • By chance you maybe happened to get those special individuals…**Permutations**• DecoupleResponse AND explanation: • Take all x-values. • Put them in a box and shake. • Pour them back in the x-column • Now there should be no relationship or no difference. Right? • But how large differences can you get by chance (with a 95 % probability)?**Lego shrimps I**• Does shrimp size depend on water quality? • Red piece = shrimp size(y, response) • Blue or green piece = clean or polluted water (x, explanatory)**Lego shrimp I**• Does shrimp size depend on water quality? • Red piece = shrimp size(y, response) • Blue or green piece = clean or polluted water (x, explanatory) • If we shuffle the x variable (red or green pieces) what difference may we get by chance? How large?**Area under the curve**95% No. random samples 2,5% 2,5% Difference**Risk of by chance only < 5 %**95% No. random samples 2,5% 2,5% Difference**Risk by chance = 1,4 +1,4 = 2,8 %**95% No. random samples 1,4% 1,4% Difference**p-value**• 2,8 % is the probability (p) that there is NO real difference. • p = 0,028 means that there is a 2,8% chance that the groups do not really differ, but that we by chance get the data points that we collected. • 2,8 % probability is ridiculously small! We don’t believe in that! • If it’s is not just due to chance, it must depend on something else… • … e.g., water quality. The habitats differ significantly. • p < 0,05 counts as ridiculously small**What will affect the p-value?**• The difference between groups • (… between their means) • The variation within groups • The sample size • (unreliability of group means) • Variation • Sample size**t-test in R**• t.test(y~x,var.equal=T)**Under the hood**• Competent Drivers vs. Mechanics**Difference between**means • Female mean = 9 • Male mean = 13 • Difference = 13-9=4Soft! • But the unreliability?**Measure of**variation? • Variation ≈ red lines! • Mean red line length? • Nope! • absolute values hard • Instead: • ≈ Mean squared red lines!**Variance**• ≈ Mean squared red lines!**Degrees of freedom**• For a group variance the df = n-1 • Why? – • To calculate a variance the mean is required! (y-mean(y))2 • But given a mean, only n-1 data point variations (y-mean(y)) can freely change and be used to estimate the variance. • If we independently change n-1 deviations, the last one can't be independent. • It must sum up with the rest to zero. • It's "freedom" is locked, used, by the mean!**Variance & Standard deviation**• Standard deviation = SD • Sometimes used to show variability in graphs • ±1 SD = 68% of data points • ±1.96 SD = 95% of data points var(y) = 3.3 sd(y) = 1.8**How much is 3.1?**1% 1% 2.5% 2.5%**How much is 3.1?**1% 1% 2.5% 2.5%