1 / 64

Jack Chen

Crash Course in R · October 16, 2009. Jack Chen. Presentation Flow. R Session: Function writing Plots customization Simulation tips. Background/ Environment. Read/Write Data. Common Data Structures and Operations. Object- o rientated Concept. Graphics Samples. Control Blocks.

jimbo
Télécharger la présentation

Jack Chen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crash Course in R · October 16, 2009 Jack Chen

  2. Presentation Flow • R Session: • Function writing Plots customization Simulation tips • Background/ Environment • Read/Write Data • Common Data Structures and Operations • Object- orientated Concept • Graphics Samples • Control Blocks Major Topics

  3. Background/Environment

  4. “S” Fortran Interactive Environment Statistical Computing Subroutines Engine “Interactive Statistical Computing System” “Statistical Analysis System” Mid 1970s, Bell Laboratory John Chambers, Rick Becker Background History

  5. Fortran C++ C Interactive Environment Statistical Computing Subroutines Engine Pascal Java R functions … Perl Early 1990s, University of Auckland Ross Ihaka, Robert Gentleman ”R” Background History

  6. Major differences between S and R • Syntax • Memory management • Variable scoping • S has developed into S-plus, a commercially available software from Tibco • R is an open source freeware, with contributed packages from researchers worldwide • Recently, XLSolutions is developing R-plus, the commercial version of R Background History

  7. Mouse-click menus Mouse-click shortcuts Command line to interact with R Starting R in Windows Environment Windows

  8. Some keyboard shortcuts for the Windows platform: • Esc: cancels current line of execution (useful when running into trouble) • Ctr-p or arrow up: previous command • Ctr-n or arrow down: next command • Ctr-u: erase line • Ctr-a or ‘home’: beginning of line • Ctr-e or ‘end’: end of line • Ctr-c: copy highlighted text • Ctr-v: paste • Ctr-x: copy and paste highlighted text • Ctr-l: clear command line window • Ctr-z or q(): quit Environment Windows

  9. Command line to interact with R Starting R in Unix Environment Unix

  10. Some keyboard shortcuts for Unix platform: • Esc or Ctr-c: cancels current line of execution (useful when running into trouble) • Ctr-p or arrow up: previous command • Ctr-n or arrow down: next command • Ctr-u: erase line • Ctr-a: beginning of line • Ctr-e: end of line • Ctr-z: send to background (type fg to bring back R) • Ctr-l: clear command line window • Ctr-r : reverse search command history • q(): quit session Environment Unix

  11. R has an interpretative environment • Everything you type on the command line followed by ‘enter’ will be sent to R’s internal engine. R performs the following steps: • Interprets what you have typed • Evaluates it • Returns a result (possibly an error message) • The only exception when R sees a comment. R does not interpret anything after the pound sign # Environment R interpretor

  12. Object-oriented Concept

  13. Object-oriented programming is a natural way to classify and modularize “things” of interest in order to interact with them during program execution. • For example, suppose in our program there are 3 shapes: • Circle • Square • Triangle • Initialization • We want to be able to create different shapes of different sizes • Interaction • We want each shape to be able to report to us its area • We want each shape to be able to display itself Object-oriented Concept Intuition

  14. Class: Shape • Type: Triangle • (Isosceles) • Functions: • Report area • Draw • Attributes: • Name ID • Base b • Height h • Class: Shape • Type: Square • Functions: • Report area • Draw • Attributes: • Name ID • Width w • Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Internally in a program: Object-oriented Concept Intuition

  15. Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 12π= 3.14159… Radius: r = 1 Name: ID = circle1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition

  16. Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 22π= 12.566… Radius: r = 2 Name: ID = circle2 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition

  17. Class: Shape • Type: Square • Functions: • Report area • Draw • Attributes: • Name ID • Width w Tell me the area 12= 1 Width: w = 1 Name: ID = square1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition

  18. Class: Shape • Type: Triangle • (Isosceles) • Functions: • Report area • Draw • Attributes: • Name ID • Base b • Height h Tell me the area 1(0.866)/2 = 0.433 base: b = 1 Height: h = 0.866 Name: ID = tri1 Interact Interact Initialize Draw Typical programming steps: Object-oriented Concept Intuition

  19. Class: Shape • Type: Circle • Functions: • Report area • Draw • Attributes: • Name ID • Radius r Tell me the area 12π= 3.14159… Radius: r = 1 Name: ID = circle1 area(circle1) Interact Interact Initialize Draw circle1 = Circle(r=1) draw(circle1) Translating to sensible commands: Object-oriented Concept Intuition

  20. Programming commands • circle2 = Circle(radius=2) • area(circle2) • draw(circle2) • square1 = Square(w=1) • area(square1) • draw(square1) • tri1 = Triangle(b=1, h=0.866) • area(tri1) • draw(tri1) Object-oriented Concept Intuition

  21. What does this have to do with R? • R is inherently object-oriented. • R has a set of pre-defined objects that we can interact with them • There are tons of objects inside various packages in R online repository for us to perform various tasks • We can also write our own R objects that perform analysis to our needs • The way we interact with R is very similar to the way we interacted with the program with 3 shapes Object-oriented Concept In relation to R

  22. Common Data Structures and Operations in R

  23. Primitive data objects • Comes with all R installations • Integers: -3, -2, 1, 2, 3, 1e+10, … • Doubles: 0.789, 3.14, 1.68, 2.9e-6, … • Complex numbers: 3i+7, 2i+3, … • Characters: “a”, “zZ”, “I hope you are still awake”,… • Constants: pi • Logical symbols: TRUE, FALSE • The empty object: NULL • Missing value: NA • Infinity: Inf • Some others Common Data Structures Primitive data objects

  24. Primitive operators • arithmetic: +, -, *, / • modular: %% • matrix multiply: %*% • power: ^ • logical and/or: &, | • relation: <, <=, >, >=, ==, != • assignment: =, <- Common Data Structures Primitive operators

  25. R function calls have the form: • functionName(arg1, arg2, …) • Primitive functions • square-root: sqrt(arg) • exponential: exp(arg) • natural log: log(arg) • length of object: length(arg) • sum of elements in object: sum(obj) • concatenate objects: c(arg1, arg2, …) • round down to nearest integer: floor(arg) • round up to nearest integer: ceiling(arg) • many many others Common Data Structures Primitive functions

  26. Examples of valid expressions • 1 • “a” • ‘a’ • 1 & TRUE • TRUE == FALSE • TRUE != FALSE • 2 > 3 • 1 + 2 + 3 + 4 • 2^3 • a = 4; b = 2^a • log(37) Common Data Structures Simple valid expressions

  27. Examples of invalid expressions • lala # variable not assigned • sqrt(25, 4) # too many arguments • log(1 2) # invalid argument • 1 = “a” # cannot assign value to primitive numeric • TRUE = 3 # cannot assign value to primitive logical Common Data Structures Simple invalid expressions

  28. Vectors • R vectors are column vectors, even though they are displayed horizontally in R • c(object1, object2, …, objectN) • c stands for: concatenate object1, object2, …, objectN Common Data Structures and Constructs vectors

  29. Examples of vectors: • c(1, 2, 3, 4) # numeric vector, (1, 2, 3, 4) • c(1:4) # same as above • c(1, “a”) # mixture of object types • c(c(1:3),c(7:10)) # (1, 2, 3, 7, 8, 9, 10) • c(TRUE, FALSE) # logical vector Common Data Structures and Constructs vectors

  30. Other ways to form vectors: • seq(start, end, by increment) • seq(1, 10, 1) # equivalent to c(1:10) • seq(10, 1, -1) # equivalent to c(10:1) • rep(object, repeat) • rep(1, 10) # a vector of 10 1’s • rep(c(1, 2), 10) # a vector of 1 2 1 2 … Common Data Structures and Constructs vectors

  31. Accessing vector elements • vector[start index:end index] • v = c(1, 2, 3, 4) # assigns v • c(1, 2, 3, 4)[1] # returns 1 • c(1, 2, 3, 4)[2:4] # returns (2, 3, 4) • c(1, 2, 3, 4)[-1] # removes 1st element, returns (2, 3, 4) • c(1, 2, 3, 4)[c(1, 3)] # returns (1, 3) Common Data Structures and Constructs vectors

  32. Matrices • R matrices are objects internally represented as vectors, with 2 additional attributes: • number of rows • number of columns • matrix(c(object1, object2, …, objectN), nrow = I, ncol = J) Common Data Structures and Constructs matrices

  33. Examples of matrices: • matrix(c(1:12), nrow=4, ncol=3) • matrix(c(1:12), 4, 3) # same as above • matrix(c(1:12), nrow=4) # same as above • matrix(c(1:12), ncol=3) # same as above • matrix(c(1:12), 4, 2) # invalid • Other ways to form matrices: • diag(1, 10) # 10x10 identity matrix • diag(“a”, 10) # 10x10 matrix with diagonal of “a” • diag(c(1:10), 10) # 10x10 matrix with diagonal # entries 1, 2, …, 10 Common Data Structures and Constructs matrices

  34. Accessing matrix elements • matrix[(accessing row vectors), (accessing column vectors)] • A = matrix(c(1:9), 3, 3) # assign matrix to variable name A • A[1, 1] # returns 1st row 1st element • A[1, ] # returns row 1 • A[, 1] # returns column 1 • A[, 1:2] # returns column 1, 2 • A[1:5] # returns (1, 2, 3, 4, 5) Common Data Structures and Constructs matrices

  35. Matrix manipulation • Adding a row • rbind(matrix object, vector object) • Adding a column • cbind(matrix object, vector object) • Examples: • A = matrix(c(1:9), 3 , 3) • cbind(matrix, c(10:12)) # add (10, 11, 12) as last # column • cbind(A[,1], c(10:12), A[,2:3]) # add (10, 11, 12) as # 2nd column Common Data Structures and Constructs matrices

  36. Matrix operation • Matrix operations on matrices A, B of conforming dimensions • Addition: A + B • Subtraction: A - B • Multiplication: A %*% B • Inverse: solve(A) • Transpose: t(A) • Determinant: det(A) Common Data Structures and Constructs matrices

  37. Lists • Traditionally vectors and matrices contain simple data objects, mostly primitive data objects. More complex data structures are stored in lists. • lists contain objects and their assigned names: • list(name1=object1, name2=object2, …) • Example of a list: • list(foo=“hello”, bar=“world”) Common Data Structures and Constructs lists

  38. Accessing elements in a list: • We can reference objects in lists by their names with the dollar “$” operator: • alist = list(Friday=“happy”, Monday=“urrr”) • alist$Friday # returns “happy” • alist$Monday # returns “urrr” • If no object in the list contains the name following $, then NULL is returned: • alist$Tuesday # returns NULL • We can also access objects in lists by their index with double bracket [[index]]: • alist[[1]] # returns “happy” • alist[[2]] # returns “urrr” Common Data Structures and Constructs lists

  39. Operating on R objects • R operations are vector-based • When the left hand side (LHS) and right hand side (RHS) of an operator conform, elements on LHS of an operator interact with elements on RHS • Examples • c(1, 2) + c(3, 4) # returns (4, 6) • c(1, 2) + c(3, 4, 5, 6) # returns (4, 6, 6, 8) # (1, 2) is added to (3, 4) and (5, 6) • 2^c(1, 2, 3, 4) # returns (2, 4, 8, 16) • c(1, 2)^c(1, 2, 3, 4) # returns (1, 4, 1, 16) Operations operating on R objects

  40. Operating on R objects • Most of the built-in R objects can report their dimensions. • Examples: • length(c(1:4)) # return 4 • length(list(a=1, b=2)) # return 2 • length(matrix(c(1:12),4,3)) # return 12 • nrow(matrix(c(1:12),4,3)) # returns 4 • ncol(matrix(c(1:12),4,3)) # returns 3 Operations operating on R objects

  41. Control Blocks

  42. Logical Expressions • Logical expression is an expression which evaluates to TRUE or FALSE • Logical expressions can be formed by the relation operators • equal: == • not equal: != • less than < • greater than > • less than or equal to: <= • greater than or equal to: >= • Examples: • 0 < 1 # evaluates to TRUE • 0 > 1 # evaluates to FALSE • “A” == “a” # evaluates to FALSE Control Blocks Logical expressions

  43. if-else statement • if (logical expression) { … } else { … } • { … } can be a single expression, or a group of expressions and statements, including another if-else statement. • The else part of the statement is optional. • Examples: • if (0 < 1) “true” • if (0 > 1) “should not see anything” • if (“a” == “A”) { “not equal” } else { “equal” } • if (FALSE) { “nothing” } else if (TRUE) { “something” } Control Blocks if-else statement

  44. While loop • while (logical expression) { … } • { … } (the “body” of the statement) can be a single expression, or a group of expressions. • while statement loops inside { … } until the logical expression evaluates to FALSE. • Example: • while (TRUE) { “never ends!!” } • while (FALSE) { “never executed!!” } • x=1; while (x==1) { print(x); x=2 } # prints 1, then # assign x to 2 Control Blocks while loop

  45. For loop • for (index in start:end) { … } • { … } (the “body” of the statement) can be a single expression, or a group of expressions or statements. • for statement loops in { … } until index exceeds end • Example: • for (i in 1:10) { print(i); } Control Blocks for loop

  46. Read/Write Data

  47. Read/Write Data • Importing and Exporting data in R is relatively painless. • We can easily import/export files where: • data points are separated by commas • data points are separated by tabs or spaces • data points are separated by some other delimiter. • Read SAS/SPSS/Stata data • Package “foreign” contains functions that allow you to read, among others, SAS/SPSS/Stata data. • type: install.packages(“foreign”), select a location to download package, the rest is automatic • type: library(foreign) to load the package • type: help(package = foreign) to see a list of functions Read/Write Data

  48. Example of reading a file # reads a file, data points separated by spaces or tabs # assign first column to y, second column to x1, third column to x2 file = “http://www-personal.umich.edu/~jktc/R/samples/simple.dat” read.table(file, col.names=c(“y”, “x1”, “x2”)) # specify missing data in file read.table(file, na.strings= “.”) # if first row of data file has header (names for each column) file2 = http://www-personal.umich.edu/~jktc/R/samples/simple.header.dat read.table(file2, header=TRUE) # to see more details of read.table function help(read.table) Read/Write Data Reading from a file

  49. Example of writing to a file data = matrix(c(1:9), 3, 3) # write a space separated file. # assign first column to y, second column to x1 # third column to x2 write.table(data, file=“c:/temp/simple.dat”, row.names=FALSE, col.names=c(“y”, “x1”, “x2”), sep=““) # to see more details on write.table function help(write.table) Read/Write Data Writing to a file

  50. Graphics Samples

More Related