1 / 24

Overall Aims

Overall Aims. Introduce programming concepts relevant to MX Demonstrate the strengths (and weaknesses) of R. Books. The R Book – Crawley (2007) Introductions to statistics using R Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R .

hunter
Télécharger la présentation

Overall Aims

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overall Aims • Introduce programming concepts relevant to MX • Demonstrate the strengths (and weaknesses) of R

  2. Books • The R Book – Crawley (2007) • Introductions to statistics using R • Cohen Y. and Cohen J. Y. (2008). Statistics and Data with R. • Crawley M. (2005). Statistics: An Introduction using R. • Dalgaard P. (2002). Introductory Statistics with R. • Maindonald J. & Braun J. (2003). Data Analysis and Graphics Using R: An Example-based Approach. • Books on biological topics • Paradis E. (2006). Analysis of Phylogenetics and Evolution with R. • Broman K. W. & Sen S. (2009). A Guide to QTL Mapping with R/qtl. • Bolker B.M. (2008). Ecological Models and Data in R. • Books on statistical topics • Aitkin M. et al. (2009). Statistical Modelling in R. • Faraway J. (2009). Linear Models with R. • Albert J. (2009). Bayesian Computation with R. • Bivand R.S. et al. (2009). Applied Spatial Data Analysis with R. • Cowpertwait P.S.P. & Metcalfe A.V. (2009). Introductory Time Series with R. • Books on R specifics and R programming • Spector P. (2008). Data Manipulation with R. • Murrell P. (2006). R Graphics. • Chambers J. M. (2008). Software for Data Analysis: Programming with R.

  3. Websites • Websites: • Cran R: http://www.r-project.org/ • R cookbook: http://www.r-cookbook.com/ • R graphics: http://addictedtor.free.fr/graphiques/ • R wiki: http://wiki.r-project.org/ • Mailing lists: http://www.r-project.org/mail.html • R seek: http://www.rseek.org/ • Websites on statistical topics • R genetics: http://rgenetics.org/trac/rgalaxy • Bioconductor: http://www.bioconductor.org/

  4. The console • Load up R • Console window appears, with a command prompt • Everything in the R console can be partitioned into two fundamental operations: • Input variables • > x <- 2 • Output variables • > x • [1] 2

  5. Objects • Names • Case sensitive, no spaces • Must begin with a letter but also can contain numbers and: . _ • Try to give your objects meaningful names • > My_f4vourite.langua6e_evR <- “R” • x,y and My_f4v… are objects that we have created • > ls() # this will bring up a list of all our objects • > rm(y) # this deletes y (forever) • > rm(list=ls()) # this deletes everything (..forever)

  6. Workspace 1 • Everything shown in this list of objects comprises our 'workspace' • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ > save.image(file=“myworkspace.RData”) • > rm(list=ls()) • > ls() • character(0) • > load(file = “myworkspace.RData”) • > ls() [1] "My_f4vourite.langua6e_evR" "x" "y“ • Objects are internal to R • Does not behave like a file structure on the computer • Can't be read or interpreted outside R (?)

  7. Workspace 2 • You can select which objects to save > save(y, x, file = “two_objects.RData”) • Different computer folders can be accessed > dir() # shows current work directory > setwd(“~/work_directory”) # sets R's focus to a different computer folder

  8. Built-in functions • Native functions make R succinct • Diverse range available from graphics to data manipulation to statistical algorithms etc. • Highly optimised so use them if they are available instead of writing your own • Function structure: > function_name(<argument 1>, <argument 2>, …)

  9. Missing values • NA is a “reserved” word in R • It is a single element (length 1) that indicates a missing value • A helpful alternative to coding missing values (e.g -99) > my_array <- c(NA,100,120,120,120,130,NA) > sum(my_array) [1] NA > sum(my_array,na.rm=T) # most functions allow you to explicitly state how to handle NA [1] 590 > table(my_array) # HOWEVER the default action varies from function to function my_array 100 120 130 1 3 1

  10. R help pages • Each function has its own unique syntax • Default arguments • Data structure requirements • Output options • > ?seq # brings up help page of seq() function • > ??”sequence” # searches for all related functions • Note • > seq(from = 2, to = 100, by = 2) • is clearer than • > seq(2,100,2)

  11. Basic Scripting • Note pad / text editor • Within the R GUI • Open with: File > New Script or Ctrl+N • Layout as tile is useful: Windows > Tile

  12. Basic Scripting • Note pad / text editor • Useful for keeping all work together • Scripts can be saved • Can be used to save a “program” • Add # comments • Check individual bits of code • Ctrl+R • Whole line • Selected code

  13. Basic Scripting • Brackets • ( ) functions • [ ] subsets • { } processes • Subsets • Take a subset of an object • Objects have either 1 x n, or m x n dimensions > x [1] 2 5 6 2 6 77 55 > x[5] [1] 6 > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 [rows, columns] > X[3,4] [1] 12

  14. Basic Scripting • Data input • Direct input into the console • scan() • Reading in data • read.table / read.csv • “name.txt” • “c:\\temp\\name.txt” • choose.file() • list.files() • dir() > y <- scan() 1: 3 2: 4 3: 12 4: 3 5: 5 6: 2 7: 14 8: Read 7 items > dir() [1] "temp.csv" "temp2.csv" “name.txt” > y <- read.table("name.txt", header=T, sep="\t") >

  15. Basic Scripting • Data output • Direct input into the console • sink() • Writing out data • write.table/ write.csv • “name.txt” • “c:\\temp\\name.txt” sink(“sink_tmp.txt”) i <- 1:10 outer(i, i, "*") sink() > dir() [1] "temp.csv" "temp2.csv" “name.txt” > write.table("name.txt", header=T, sep="\t") >

  16. Basic Scripting • Adding rows and columns • Allows objects to be joined, either to an existing object or to make a new object • cbind() – adds columns together • rbind() – adds rows together > y3 <- cbind(y1, y2) > y3 [,1] [,2] [,3] [,4] [1,] 1 3 12.5 0.349 [2,] 1 2 13.8 0.745 [3,] 1 5 15.3 0.684 [4,] 1 4 16.8 0.964 > y3 <- rbind(y1, y2[1:3]) > y3 [,1] [,2] [,3] [1,] 1.000 3.000 12.500 [2,] 1.000 2.000 13.800 [3,] 1.000 5.000 15.300 [4,] 1.000 4.000 16.800 [5,] 0.349 0.745 0.684 > y1 [,1] [,2] [,3] [1,] 1 3 12.5 [2,] 1 2 13.8 [3,] 1 5 15.3 [4,] 1 4 16.8 > y2 [,1] [1,] 0.349 [2,] 0.745 [3,] 0.684 [4,] 0.964

  17. Basic Scripting • for loops • loop through a set of commands a given number of times • very useful, but are not optimal for memory > dim(y) [1] 10 10 > for(i in 1:ncol(y)) { y_mean <- mean(y[i, 1:10]) } > y_mean [1] 0.1974492 > out <- array(0, c(ncol(y), 1)) • > for(i in 1:ncol(y)) { • out[i] <- mean(y[i, ]) • } • > out • [,1] • [1,] -0.3110800 • [2,] -0.2000344 • [3,] 0.2019573 • [4,] 0.2859823 • [5,] 0.1932523 • [6,] 0.2759323 • [7,] -0.2571102 • [8,] -0.1037983 • [9,] 0.3522018 • [10,] 0.1974492

  18. Data Manipulation • Check data • dim() • mydata[1:10, 1:10] • str() • summary() • head() • tail() • table() • etc… > mydata <- read.table("mydata.txt", header=T, sep="\t") > dim(mydata) [1] 642 1470 > mydata[1:10, 1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 2 2 1 2 1 2 0 1 0 1 [2,] 0 0 2 2 0 0 1 2 1 2 [3,] 0 2 2 2 1 1 0 0 2 1 [4,] 2 0 2 2 2 0 1 2 0 1 [5,] 2 0 0 2 0 1 1 0 2 0 [6,] 2 1 2 1 1 0 2 2 1 1 [7,] 1 1 2 2 1 2 2 2 0 1 [8,] 0 1 0 0 0 1 1 1 1 1 [9,] 0 0 1 2 1 2 2 0 0 1 [10,] 1 0 1 1 2 0 1 0 0 1

  19. Data Manipulation • Reordering • If you have a data.frame or matrix (numbers or letters) • Use: order() • index <- order(old[,1], decreasing=T) > dim(lamb) [1] 1600 5 > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 A 25.52592 1 1 M 4 A 25.56016 1 1 M 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F > lamb <- lamb[order(lamb$sex, decreasing=F), ] > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 5 A 24.53296 1 2 F 6 A 22.03344 1 2 F 9 A 30.37944 2 1 F 10 A 25.93680 2 1 F

  20. Data Manipulation • Reordering • order() > lamb <- lamb[order(lamb$sex, decreasing=F), ] > rows <- order(lamb$sex, decreasing=F) > lamb <- lamb[rows, ] Expanded way > index <- order(lamb$sex, decreasing=F) > head(index) [1] 1 2 5 6 9 10 > lamb <- lamb[index, ]

  21. Data Manipulation Replacing index which() > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,1]==“A” > head(index) [1] TRUE TRUE FALSE TRUE FALSE > lamb[index, 1] <- ”C” > head(lamb) Field Weight sire dam sex 1 C 22.92368 1 1 F 2 C 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- which(lamb[,1]=="A") > head(index) 1 2 4 6 7 10 > lamb[index, 1] <- ”C” Put it together > lamb[which(lamb[,1]==”A”, 1] <- ”C”

  22. Data Manipulation • Replacing > class(lamb) [1] “matrix” > head(lamb) Field Weight sire dam sex 1 A 22.92368 1 1 F 2 A 27.52896 1 1 F 3 B 25.52592 1 1 M > index <- lamb[,2] <= 22.000 > table(index) index FALSE TRUE 1553 47 > lamb[index, 2] <- ”NA” > which(lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 842 921 983 1103 1126 > which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) 214 363 496 > new_lamb <- lamb[which(lamb[,1]==“A” & lamb[,2] >= 20.0 & lamb[,2] <= 21.0) , ] > new_lamb Field Weight sire dam sex 214 A 2046 27 2 F 363 A 2008 46 1 M 496 A 2041 62 2 M

  23. Graphics with R: Overview • Why graphics? • Why graphics in R? • The R graphics systems (did you really expect just one?) • Graphics basics and examples • Customisation of a graphic • Overview of different systems and packages Introduction to R: Joseph Powell

  24. plot(x, y, …) > ?Formaldehyde > head(Formaldehyde) carboptden 1 0.1 0.086 2 0.3 0.269 3 0.5 0.446 4 0.6 0.538 5 0.7 0.626 6 0.9 0.782 > plot(Formaldehyde) > ?par Introduction to R: Joseph Powell

More Related