140 likes | 224 Vues
R A Personalized Introduction. Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014. About “R”. A suite of software tools for Data manipulation Calculations Graphical display Largely based on the programming language S Packages
E N D
RA Personalized Introduction Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014
About “R” • A suite of software tools for • Data manipulation • Calculations • Graphical display • Largely based on the programming language S • Packages • About 25 packages standard and recommended supplied • Many more available for download at: http://CRAN.R-project.org • Free (GPL). Also BSD, MIT
Basic • Arithmetic > 2+2 [1] 4 • Assign variables > x <- 2 > y <- 5 > z <- 2 * x + 3 * y > z [1] 19 • The created objects are now stored in the workspace. List them > ls() [1] "x" "y" "z” • Also, we can remove them > rm(x) > ls() [1] "y" "z”
Vectors • Creating a vector > x <- c(2,5,9) > y <- c(3,1,-1) > x + y [1] 5 6 8 • But x * y would do a element-wise multiplication > x * y [1] 6 5 9 • But x + 2 would add 2 to all elements of x > x + 2 [1] 4 7 11
Useful functions related to vectors • Sequence of integers from a to b > seq(2,9) [1] 2 3 4 5 6 7 8 9 • The repeat function > rep(1,3) [1] 1 1 1 > rep(1:3,3) [1] 1 2 3 1 2 3 1 2 3 • Try the help or ? command > help(rep) > ?rep
Data and Statistics – Basics • A lot of things out of the box > x <- c(2,3,1,5,7,2,5,8,3,2,0,3,2,6,7,3,1,3,5,8,4) > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 2.00 3.00 3.81 5.00 8.00 • Specifying elements or subsets (index starts at 1, not 0) > x[1] [1] 2 > x[3:6] [1] 1 5 7 2 • Excluding elements by the minus sign > x[-(2:4)] [1] 2 7 2 5 8 3 2 0 3 2 6 7 3 1 3 5 8 4
Matrices • Bind columns (cbind) or rows (rbind) > x <- c(3,5,2); y <- c(8,2,1) > z <- cbind(x,y) > z x y [1,] 3 8 [2,] 5 2 [3,] 2 1 • Or specify the entries and number of rows > A <- matrix(c(3,5,2,8,2,1),nrow=3) > B <- matrix(c(3,5,2,8,2,1),nrow=2)
Matrix operations • Addition is usual > A + 2* A [,1] [,2] [1,] 9 24 [2,] 15 6 [3,] 6 3 • Multiplication: x * y is element wise, not matrix multiplication • Matrix multiplication: %*% > A %*% B [,1] [,2] [,3] [1,] 49 70 14 [2,] 25 26 12 [3,] 11 12 5
Inverse and Covariance of matrix • Computes the inverse of a matrix if it exists: > solve(X) • Covariance matrix > var(X) > cov(X) • Covariance matrix (recall) X1,…, Xn are random variables, each with finite variance Σis the covariance matrix where • Also called var(X) = Variance of the random vector X
Writing a function • A new function can be defined > z <- function(x,y) 3*x + 4*y > z(2,3) [1] 18 • A function with many lines > z <- function(x,y) { c <- 3*x + 4*y; 5 * c } • The last line is the output • Can write the function in a text file prog.R and source it > source("/Users/deb/…/R/xTest.R") • Can also define a new binary operator > “%LL%” <- function(x,y) { 3*x + 4*y } > 5 %LL% 3
Data • Read an entire data frame • The first line of the file should have a name for each variable in the data frame • Each additional line of the file has as its first item a row label and the values for each variable Age Income.KOwns.House 01 25 8 No 02 33 5 No 03 30 130 Yes 04 45 50 Yes 05 65 5 No 06 75 7 Yes > H <- read.table(”filename")
Using data • Plot tries to figure out what kind of plot will be suitable > plot(H[1:2]) • We want to label points based on some attribute • Let us select a subset of the data > H[which(H$Owns.House=='Yes'),] Age Income.KOwns.House 03 30 130 Yes 04 45 50 Yes 06 75 7 Yes 07 28 200 Yes 08 35 90 Yes 10 55 102 Yes … … … …
Using data • Plot one subset with blue, another with red > HYes <- H[which(H$Owns.House=='Yes'),] > plot(HYes[1:2], col='blue') > points(HNo[1:2], col='red') New observation (black) Hands on in class
References • The R manual: http://cran.r-project.org/doc/manuals/r-release/R-intro.html • A self-learn tutorial:https://www.nceas.ucsb.edu/files/scicomp/Dloads/RProgramming/BestFirstRTutorial.pdf