260 likes | 391 Vues
This module provides an introduction to advanced statistical analyses using R designed for ecologists. Aimed at enhancing your skills beyond standard statistics, it covers multivariate techniques, model selection, and high-quality graph production. Participants will learn to choose appropriate significance tests, explore data structures, and independently tackle analytical challenges. Additionally, skills in poster design using CorelDraw and grant proposal writing will be developed. By the end, participants will confidently apply advanced R techniques to ecological research.
E N D
Advanced Research Skills Lecture 1 Introduction Olivier MISSA, om502@york.ac.uk
Aims • Introduce the use of R for advanced statistical analysesbeyond "Statistics for Ecologists". • Demonstrate these analyses on a broad range of questions and situations. • Develop your understanding of statistical programming. • Empower you to tackle future analytical challenges on your own.
Aims • Other skills will be developed too. • Produce posters using CorelDraw (graphics package). • Learn how to write a grant proposal.
Learning Outcomes • At the end of the module, you should be able to : • Determine which test to use for significance testing. • Explore the inherent structure of your data through a wide range of multivariate techniques. • Work out which model "best explains" the variable you are interested in. • Produce high quality graphs (ready for publication) using fully R graphical capabilities.
Organisation • Staff • Olivier Missa (OM), module organiser, R sessions om502@york.ac.uk • Emma Rand (ER), R sessionser13@york.ac.uk • Phil Roberts (PTR), CorelDraw session • ptr2@york.ac.uk • Peter Mayhew (PJM), Grant writing session • pjm19@york.ac.uk
Organisation • Structure • 9 theoretical lectures (OM) on advanced stats. • 9 practical sessions (OM & ER) on using R. • 1 practical session (PTR) on CorelDraw. • 1 tutorial session (PJM) on Grant writing.
Organisation • Content • L1 Introduction • L2 – L4Linear Models • L5 – L6 GLMs & Mixed-effects models • L7 Non-Linear Models • L8 – L9 Multivariate Analyses • Each lecture is accompanied by a practical session
Organisation • Assessment • Open Data Analysis exercise, • Written reportwith Introduction, • Material & Methods, • Results, • Discussion. • particular emphasis on justifying the analysesand interpreting the results properly.
What is R ? • "R is a language and environment for statistical computing and graphics" R website • A programming language, actually a dialect of S, which wasdeveloped in the 80s by John Chambers at the Bell Labs. • The Bell Labs then sold S to MathSoft (now Insightful Co.), which developed it further into S-Plus, a commercial Statistical package. • In the 90s, S was rewritten from scratch by two statisticians, Ross Ihaka & Rob Gentleman, from New Zealand. • Since then R has continued to grow in scale and scope and is currently maintained by about 20 people across the globe.
Why use R ? • The Key Benefits : • it'sFreeIt won't cost you a penny ever • OpenHow things are calculated is not hidden • Fully customisableThe user is in full control • Cutting EdgeStats Pros use it to create new techniques • Very Widespread (increasingly so) Thousands of contributors (packages), millions of users • Supported by an international user communityhappy to provide help and assistance
Why use R ? • The Drawback : • Steep Learning Curve • You need to learn the language • You need to know what you are doing (stats)
What is R Good for ? • Absolutely everything (to do with data) • Statistics • Modelling • Programming / Simulations • Graphics(from very simple to complex, 2D, 3D, ...) • Database(simple relational functions) • Bioinformatics (Bioconductor project) • Platform interacting with other Softwares (e.g. Ggobi, WinBUGS, MySQL, GRASS GIS)
Example of a session • > data(volcano) • > dim(volcano) • [1] 87 61 • > volcano • [,1] [,2] [,3] [,4] [,5] [,6] [,7] . . . [,61] • [1,] 100 100 101 101 101 101 101 . . . 103 • [2,] 101 101 102 102 102 102 102 . . . 104 • . . . . . . . . . . . . . . . . . . . . . . . . . . • [87,] 97 97 97 98 98 99 99 . . . 94 • > volcano[1:3,1:3] • [,1] [,2] [,3] • [1,] 100 100 101 • [2,] 101 101 102 • [3,] 102 102 103
> range(volcano) • [1] 94 195 • > mean(volcano) • [1] 130.1879 • > sd(volcano) • [1] 6.902227 7.565538 8.203669 8.735686 . . . • [8] 11.165554 11.735217 12.733854 13.668694 . . . • . . . • > ?sd## help('sd') doesthe same • > sd • function (x, na.rm = FALSE) • { if (is.matrix(x)) • apply(x, 2, sd, na.rm = na.rm) • else if (is.vector(x)) • sqrt(var(x, na.rm = na.rm)) • else if (is.data.frame(x)) • sapply(x, sd, na.rm = na.rm) • else sqrt(var(as.vector(x), na.rm = na.rm)) • } . . .
> sd(as.vector(volcano)) • [1] 25.83233 • > summary(as.vector(volcano)) • Min. 1st Qu. Median Mean 3rd Qu. Max. • 94.0 108.0 124.0 130.2 150.0 195.0 • > volcano.v <- as.vector(volcano) • > dim(volcano.v) • NULL • > length(volcano.v) • [1] 5307 • > 61*87 • [1] 5307 • > volcano.v[1:87] == volcano[,1] • [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE . . . • . . . . . . . . . . . . . . . . . . . . . . • [87] TRUE • > volcano.v[1:61] == volcano[1,] • . . . only three values (out of 61) show "TRUE"
> plot(volcano) not useful, only show that elevation in columns 1 and 2 tend to be correlated
E W • > plot(volcano) • > plot(volcano.v, pch=20) • > hist(volcano, prob=TRUE, • + xlab="volcano elevation (m)") • > x <- seq(90,200,1) • > curve(dnorm(x, mean=mean(volcano.v), • + sd=sd(volcano.v)), add=TRUE) • > shapiro.test(volcano.v) • Error in shapiro.test(volcano.v) : • sample size must be between 3 and 5000 • > smpl <- sample(volcano.v, 5000) • > shapiro.test(smpl) • Shapiro-Wilk normality test • data: smpl • W = 0.9358, p-value < 2.2e-16
> library(nortest)##Package of Normality tests • > ad.test(volcano)## Anderson-Darling • Anderson-Darling normality test • data: volcano • A = 106.2715, p-value < 2.2e-16 • > cvm.test(volcano) ## Cramer-von Mises • > lillie.test(volcano) ## Lilliefors • > pearson.test(volcano) ## Pearson (Chi2) • > sf.test(smpl) ## Shapiro-Francia • > qqnorm(volcano.v) • > qqline(volcano.v, col="red")
> x <- 10*(1:nrow(volcano)) ## 10, 20, ..., 610 • > y <- 10*(1:ncol(volcano)) ## 10, 20, ..., 870 • > image(x, y, volcano)
> x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1)
> x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE, asp=1)
> x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE, asp=1) • > contour(x, y, volcano, • + levels = seq(90, 200, by=5), • + add = TRUE, col = "peru")
> x <- 10*(1:nrow(volcano)) • > y <- 10*(1:ncol(volcano)) • > image(x, y, volcano) • > image(x, y, volcano, asp=1) • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE) • > contour(x, y, volcano, • + levels = seq(90, 200, by=5), • + add = TRUE, col = "peru") • > image(x, y, volcano, asp=1, • + col = terrain.colors(100), • + axes = FALSE) • > contour(x, y, volcano, • + levels = seq(90, 200, by=10), • + add = TRUE, col = "peru")
image + contour persp with shading persp • Gallery of other Volcano Graphs surface3d
More Classical Graphs Histogram + Theoretical curve Boxplot Stripchart Pie chart Barplot 3D models