STAT 534: Statistical Computing Course Overview

STAT 534: Statistical Computing Hari Narayanan harin@uw.edu

Course objectives • Write programs in R and C tailored to specifics of statistics problems you want to solve • Familiarize yourself with: • optimization techniques • Markov Chain Monte Carlo (mcmc)

Logistics • Class: • Tuesdays and Thursdays 12:00pm – 1:20pm • Office hours: • Thursday 2:30pm – 4pm (Padelford B-301) or by appt • Textbooks: • Robert & Casella : Introducing Monte Carlo Methods with R • Kernighan & Ritchie : C Programming Language • Evaluation: • 4 assignments • 2 quizzes • Final project

x + y addition • x - y subtraction • x * y multiplication • x / y division • x ˆ y exponentiation • x %% y modular arithmetic • x %/% y integer division • x == y test for equality • x <= y test for less-than-or-equal • x >= y test for greater-than-or-equal • x && y boolean and for scalars • x || y boolean or for scalars • x & y boolean and for vectors (vector x,y,result) • x | y boolean or for vectors (vector x,y,result) • !x boolean negation

Quick Quiz- write your answers in your notebook • What would be the response if you type the following on the R console? • 1. q; ls; assign(a,2+3); “a”<- 2+4, a <- 2+4, v=1,2;

Answer to 1. • q , ls will yield descriptions of the corresponding functions whereas q() will yield a prompt whether the session should be stored before quitting, ls( ) will yield the list of objects stored in the workspace. • assign(a,2+3) yields an error message while assign(“a”,2+3) assigns to object `a’ the number 2+3. “a”<- 2+4 will leave “a” as the character “a” while a <- 2+4 correctly assigns to `a’ the number 2+4. • v=1,2 yields an error message while v= c(1,2) assigns to `v’ the vector 1,2.

Questions 2,3,4 • 2. v=c(1,2); u=c(-5,v,7);u • 3. a=c(1,2);b=c(1,2,3);r=3*a+b-1;r • 4. sort(c(4,3,7)); sort(c(3,4,7), decreasing =TRUE)

Answers 2,3,4 • 2. [1] -5 1 2 7 • 3. There will be a warning message that the longer object length is not a multiple of the shorter object length but this will be followed by the value of `r’ obtained by recycling `a’ to match the length of `b’ i.e. [1] 3 7 5 • 4. [1] 3 4 7 [1] 7 4 3

Questions 5,6 • 5. r<- c(TRUE,FALSE); r; r*2 • 6. a<-10; 5:a+1

Answers 5,6 • 5. [1] TRUE FALSE [1] 2 0 (while computing r*2 , the vector `r’ is coerced to a numerical vector with TRUE replaced with `1’ and FALSE replaced with `0’.) • 6. [1] 6 7 8 9 10 11 (`:’ takes precedence over `+’ so we have (5,6,7,8,9,10)+(1,1,1,1,1,1) , otherwise answer would have been 5,6,7,8,9,10,11)

Questions 7,8 • 7 . seq(2,25,by=7) • 8. rep(c(1,2,3),c(4,2,1))

Answers 7,8 • 7 . [1] 2 9 16 23 (cannot cross 23, step size 7) • 8 . [1] 1 1 1 1 2 2 3 (1,2,3 repeated respectively 4,2,1 times.)

Question 9,10 • 9. a<-seq(3,46,5); a; a[-1:-3]; a[-1]-a[-length(a)] • 10. a<-seq(3,46,5);a; dim(a)=c(3,3);a

Answers 9,10 • 9. [1] 3 8 13 18 23 28 33 38 43 [1] 18 23 28 33 38 43 (first second and third elements of `a’ removed) [1] 5 5 5 5 5 5 5 5 ( (8,13,18,23,28,33,38,43) - (3 8 13 18 23 28 33 38 )) • 10. [1] 3 8 13 18 23 28 33 38 43 [,1] [,2] [,3] [1,] 3 18 33 2,] 8 23 38 (columnwise filling of the 3x3 matrix) [3,] 13 28 43

Question 11 • 11. matrix(c(1,2,3,4,5,6), ncol=3) ; matrix(c(1,2,3,4,5,6), nrow=2) ; matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE)

Answers 11 • 11. [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 ( matrix filled column by column ) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 ( matrix filled column by column ) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 (now filled row by row)

Question 12 • 12. m<- matrix(c(1:24),ncol=4); m; m[1,2]; m[1:4,2:4]; m[c(TRUE,FALSE,TRUE,FALSE,FALSE,FALSE),TRUE];

Answers 12 • 12. [,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 2 8 14 20 [3,] 3 9 15 21 [4,] 4 10 16 22 [5,] 5 11 17 23 [6,] 6 12 18 24 [1] 7 [,1] [,2] [,3] [1,] 7 13 19 [2,] 8 14 20 [3,] 9 15 21 [4,] 10 16 22 (rows 1,2,3,4 and cols 2,3,4) [,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 3 9 15 21 (rows 1, 3 and all columns)

Inputing matrices: binding cols or rows > cbind(c(1:5),c(5:9),c(9:13)) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 [5,] 5 9 13 >#or bind rows together as below > rbind(c(1:5),c(5:9),c(9:13)) [,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 5 6 7 8 9 [3,] 9 10 11 12 13

Using names for rows and columns • > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > rownames(x)<-LETTERS[1:3] > colnames(x)<-c("Red","Blue","Green","Yellow") > x Red Blue Green Yellow A 1 4 7 10 B 2 5 8 11 C 3 6 9 12

Transpose • > cbind(c(1:2),c(5:6),c(9:10)) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 • > t(cbind(c(1:2),c(5:6),c(9:10))) [,1] [,2] [1,] 1 2 [2,] 5 6 [3,] 9 10

*Altering elements of a matrix >z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 > # Suppose we wish to alter z[1,2],z[1,4],z[3,4] respectively to -100,-200,-300

***** (skip first time) > #First build the corresponding index array ># Notice that if you write (1,2),(1,4),(3,4) one below the other and read the result column by column you get the sequence (1,1,3,2,4,4) > I <- array(c(1,1,3,2,4,4), dim=c(3,2)) > i [,1] [,2] [1,] 1 2 [2,] 1 4 [3,] 3 4 > # Now change z[i] > z[i]<-c(-100,-200,-300) > z [,1] [,2] [,3] [,4] [1,] 1 -100 7 -200 [2,] 2 5 8 11 [3,] 3 6 9 -300

Filtering on Matrices • > z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 • z[z[,2] >= 5,] [,1] [,2] [,3] [,4] [1,] 2 5 8 11 [2,] 3 6 9 12 (all rows where second column entry is greater than or equal to 5)

*Apply() • f<- function(x) {{y=x^3}} > apply(z,2,f) [,1] [,2] [,3] [,4] [1,] 1 64 343 1000 [2,] 8 125 512 1331 [3,] 27 216 729 1728 (the 2 in apply(z,2,f) refers to column-it means z is processed column by column ; note the result is always displayed column by column) > apply(z,1,f) [,1] [,2] [,3] [1,] 1 8 27 [2,] 64 125 216 [3,] 343 512 729 [4,] 1000 1331 1728 (confusing! Note that after applying the function on z row by row the result is displayed column by column)

Exercise • b<-c(5:12); dim(b)= c(4,2) • # What is b? • d<-t(b); • # What is d? • dim(d)=c(1,8); • # What is d? > x Red Blue Green Yellow A 1 4 7 10 B 2 5 8 11 C 3 6 9 12 • #what is t(x)?

Answers • > b [,1] [,2] [1,] 5 9 [2,] 6 10 [3,] 7 11 [4,] 8 12 • > d [,1] [,2] [,3] [,4] [1,] 5 6 7 8 [2,] 9 10 11 12 • > d [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 5 9 6 10 7 11 8 12 • > t(x) A B C Red 1 2 3 Blue 4 5 6 Green 7 8 9 Yellow 10 11 12

*Exercise • > z [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 • > y<- z[,z[1,]>=3];y • > f<- function(x){x*x} • > apply(y,1,f)

*Answer • > y<-z[,z[1,]>=3];y [,1] [,2] [,3] [1,] 4 7 10 [2,] 5 8 11 [3,] 6 9 12 • > apply(y,1,f) [,1] [,2] [,3] [1,] 16 25 36 [2,] 49 64 81 [3,] 100 121 144

*sapply() for vectors • > f<- function(x) {return(c(x,x^2))} • > z<-c(-2:1) • > f(z) [1] -2 -1 0 1 4 1 0 1 • (just displays f(z) element by element as a vector) > sapply(z,f) [,1] [,2] [,3] [,4] [1,] -2 -1 0 1 [2,] 4 1 0 1 (displays z and f(z))

Lists • Allows us to group objects of different sizes, dimensions and different modes. > x<-10; y<-c(1:5); z<-array(c(1:12),dim=c(3,4)) > lst<-list(x,y,z); lst [[1]] [1] 10 [[2]] [1] 1 2 3 4 5 [[3]] [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12

We could have made it more informative using `names’ > lst <- list("NUM"=x,"VECTOR"=y,"MATRX"=z) > names(lst) [1] "NUM" "VECTOR" "MATRX" #Names are also called `tags’, the object with the tag is its `value’ > lst $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12

Operations on lists • > lst_red<- lst[-3] (deleted the third object in the list) > lst_red $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 • > unlist(lst_red) NUM VECTOR1 VECTOR2 VECTOR3 VECTOR4 VECTOR5 10 1 2 3 4 5 (`looks inside’ each object of the argument list i.e. lst_red)

Accessing/indexing list elements • > lst[[2]] [1] 1 2 3 4 5 # (gives the second object of the list) • > lst[c(1,3)] $NUM [1] 10 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 (this is a sublist with first and third objects)

Adding elements to list • > lstaug<-lst • > lstaug$d<- c("RED","BLUE") #adding a new row • > lstaug $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 $d [1] "RED" "BLUE"

Deleting elements from list • > lstaug $NUM [1] 10 $VECTOR [1] 1 2 3 4 5 $MATRX [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 $COLOR [1] "RED" "BLUE“ (lstaug contains four objects above; we will delete three of them) • lstaug$MATRX<- NULL;lstaug$COLOR<-NULL;lstaug$NUM<-NULL; • lstaug$VECTOR [1] 1 2 3 4 5

Exercise • > x<-c("RED","BLUE") • > y<-array(c(1:4),dim=c(2,2)) • # Prepare a list containing x,y with x named “Character” and y named “vector”. Next add an element z to the list and Call it “Numeric”

Answer • > lst<-list("CHARACTER"=x,"MATRIX"=y) • > lst $CHARACTER [1] "RED" "BLUE" • $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 • > lst$NUMERIC=10 • > lst $CHARACTER [1] "RED" "BLUE" $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 $NUMERIC [1] 10

Exercise contd. • #From the list delete x • Ans • > lst<-lst[c(2,3)]; lst $MATRIX [,1] [,2] [1,] 1 3 [2,] 2 4 $NUMERIC [1] 10

Dataframes • Intuitively a data frame is like a matrix, with a rows-and-columns structure. • List of equal-length vectors, each column is one element of the list. • The vectors might have different modes- numeric,logical, character etc.

An Example • The scores of batsmen A,B,C,D,E at five venues CHE,MUM,HYD,DEL,KOL in 20 matches Are available in the dataframeipl_scores. • We will illustrate dataframe operations using this example

A quick look through head() • > head(ipl_scores) venue A B C D E 1 CHE 19 31 31 37 31 2 MUM 20 31 40 31 31 3 HYD 24 30 31 36 33 4 DEL 24 25 47 30 26 5 KOL 24 32 32 28 29 6 CHE 7 29 48 41 37 (the first six rows with the header)

A quick look through str() • #structure of ipl_scores • > str(ipl_scores) 'data.frame': 20 obs. of 6 variables: $ venue: Factor w/ 5 levels "CHE","DEL","HYD",..: 1 5 3 2 4 1 5 3 2 4 ... $ A : num 19 20 24 24 24 7 21 18 13 12 ... $ B : num 31 31 30 25 32 29 30 23 41 22 ... $ C : num 31 40 31 47 32 48 34 39 44 43 ... $ D : num 37 31 36 30 28 41 28 32 39 34 ... $ E : num 31 31 33 26 29 37 28 31 24 32 ...

# How is it actually? > ipl_scores venue A B C D E 1 CHE 19 31 31 37 31 2 MUM 20 31 40 31 31 3 HYD 24 30 31 36 33 4 DEL 24 25 47 30 26 5 KOL 24 32 32 28 29 6 CHE 7 29 48 41 37 7 MUM 21 30 34 28 28 8 HYD 18 23 39 32 31 9 DEL 13 41 44 39 24 10 KOL 12 22 43 34 32 11 CHE 15 19 39 34 32 12 MUM 31 21 47 37 23 13 HYD 20 25 43 39 21 14 DEL 21 32 36 35 39 15 KOL 14 25 39 34 29 16 CHE 18 41 38 36 33 17 MUM 19 15 43 34 31 18 HYD 21 27 34 35 26 19 DEL 18 32 45 43 35 20 KOL 14 29 35 37 26

Ipl_scores is not authentic • It was built like Ipl_scores2 is built below • 1. Built five vectors a,b,d,g,h> a<- round(5*rnorm(20)+20)> b<- round(5*rnorm(20)+30)> c<- round(10*rnorm(20)+40)> d<- round(10*rnorm(20)+40)> g<- round(10*rnorm(20)+35)> h<- round(5*rnorm(20)+25) • 2. Built a character vector • > v<-rep(c("DEL","MUM","CHE","HYD"),5)#ipl_scores has however five venues)

Ipl_scores2 • > a [1] 25 23 24 23 21 16 20 17 15 9 13 11 27 20 24 22 21 25 29 11> v [1] "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD"[13] "DEL" "MUM" "CHE" "HYD" "DEL" "MUM" "CHE" "HYD"

Ipl_scores2 • >ipl_scores2<-data.frame(cbind("venue"=v,"A"=a,"B"=b,"C"=d,"D"=g,"E"=h)) • > head(ipl_scores2) venue A B C D E1 DEL 25 22 50 32 232 MUM 23 30 38 33 233 CHE 24 29 41 32 194 HYD 23 33 49 33 275 DEL 21 36 51 45 296 MUM 16 30 53 48 25

Getting single columns and rows > ipl_scores$A [1] 19 20 24 24 24 7 21 18 13 12 15 31 20 21 14 18 19 21 18 14 (Column A displayed as a row) > ipl_scores[1,] A B C D E 1 19 31 31 37 31 (Row 1 displayed) • names(ipl_scores) [1] "venue" "A" "B" "C" "D" "E"

Computations on rows and columns • > colMeans(ipl_scores) A B C D E 18.65 28.00 39.40 35.00 29.85 #Let us now compute row means > rowMeans(ipl_scores) Error in rowMeans(ipl_scores) : 'x' must be numeric #Oops! That went wrong.The `venue’ column is not numeric. • > rowMeans(ipl_scores[,-1]) [1] 29.8 30.6 30.8 30.4 29.0 32.4 28.2 28.6 32.2 28.6 27.8 31.8 29.6 32.6 28.2 [16] 33.2 28.4 28.6 34.6 28.2

Making an additional column of rowmeans and an additional row > rowMeans(ipl_scores)->r_M Now we will make r_M an additional column • ipl_scoresaug<-cbind(ipl_scores,M=r_M) • # Add a row of incomplete information • >g<-data.frame(venue="MOH",A=30,B=NA,C=20,D=5,E=37,M=NA) • >ipl_scoresaug <- rbind(ipl_scoresaug,g) • >ipl_scoresaug

STAT 534: Statistical Computing Course Overview