1 / 42

Introduction to Contributed Packages in R

Introduction to Contributed Packages in R. Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu. What is R?.

carol
Télécharger la présentation

Introduction to Contributed Packages in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Contributed Packages in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone Email: elboone@vcu.edu

  2. What is R? • The R statistical programming language is a free open source package based on the S language developed by Bell Labs. • The language is very powerful for writing programs. • Many statistical functions are already built in. • Contributed packages expand the functionality to cutting edge research. • Since it is a programming language, generating computer code to complete tasks is required.

  3. Getting Started • Where to get R? • Go to www.r-project.org • Downloads: CRAN • Set your Mirror: Anyone in the USA is fine. • Select Windows 95 or later. • Select base. • Select R-2.4.1-win32.exe • The others are if you are a developer and wish to change the source code.

  4. Getting Started • The R GUI?

  5. Getting Started • Opening a script. • This gives you a script window.

  6. Getting Started Submit Selection • Submitting a program: • Use button • Right mouse click and run selection.

  7. Getting Started • Basic assignment and operations. • Arithmetic Operations: • +, -, *, /, ^ are the standard arithmetic operators. • Matrix Arithmetic. • * is element wise multiplication • %*% is matrix multiplication • Assignment • To assign a value to a variable use “<-”

  8. Getting Started • How to use help in R? • R has a very good help system built in. • If you know which function you want help with simply use ?_______ with the function in the blank. • Ex: ?hist. • If you don’t know which function to use, then use help.search(“_______”). • Ex: help.search(“histogram”).

  9. Importing Data • How do we get data into R? • Remember we have no point and click… • First make sure your data is in an easy to read format such as CSV (Comma Separated Values). • Use code: • D <- read.table(“path”,sep=“,”,header=TRUE)

  10. Working with data. • Accessing columns. • D has our data in it…. But you can’t see it directly. • To select a column use D$column.

  11. Working with data. • Subsetting data. • Use a logical operator to do this. • ==, >, <, <=, >=, <> are all logical operators. • Note that the “equals” logical operator is two = signs. • Example: • D[D$Gender == “M”,] • This will return the rows of D where Gender is “M”. • Remember R is case sensitive! • This code does nothing to the original dataset. • D.M <- D[D$Gender == “M”,] gives a dataset with the appropriate rows.

  12. Source Files • Source files allows you to store all of your created functions in a single file and have all those functions available to you. • To load a self created library use: source(Path) • Don’t forget that \ in the path needs to be replaced with \\

  13. Libraries • In order to keep R’s memory footprint small, additional functionality is stored in libraries. • These libraries can be called through the GUI or scripts. • Beware that some contributed packages may conflict with some libraries.

  14. Contributed Packages • Since R is open source and the developers are well organized, developing and finding contributed packages is easy. • Currently there are 964 contributed packages. • These range from wavelets, financial mathematics to spatial data analysis.

  15. Contributed Packages • One popular library is lattice.

  16. Contributed Packages • You can install contributed packages using the GUI.

  17. Contributed Packages • You can install the package by selecting it from the list. • Note: Installing a package does not make it immediately available for use. • You still need to use the library() statement to make the functionality available to you. library(lattice)

  18. Help on contributed packages • Once a contributed package is loaded you can access the help for the package and a list of functions available in the package by: library(help=“lattice”)

  19. The CircStats Package • Many times data may come in a circular format. • For example the direction of migration or flight of birds from their nest. • The data is an angle not a “linear” measurement. • The data can only go between 0 and 2p.

  20. The CircStats Package • Use the CircStats Package. library(CircStats) • Consider the following: data <- runif(50, 0, pi) mean.dir <- circ.mean(data) mean.dir [1] 1.446502

  21. The CircStats Package • Randomly generate data from a Von Mises distribution data.vm <- rvm(100, 0, 3) • Create a plot of it using circ.plot: circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)

  22. The CircStats Package • Regression with circular data: • Create some data data1 <- runif(50, 0, 2*pi) data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5) • Run the regression using circ.reg: circ.lm <- circ.reg(data1, data2, order=1) circ.lm (Intercept) -0.01365604 -0.02939188 cos.alpha -0.29872673 0.41344126 sin.alpha 0.78894271 0.72908521

  23. The CircStats Package • Plot the data plot(data1, data2) • Plot the predicted line circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')

  24. The norm Contributed Package • While the norm package sounds as if it would have something to do with the normal distribution it is in fact a package for dealing with missing data. • It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997). • Similar to SAS PROC MI.

  25. The norm Contributed Package • Load the library. library(norm)

  26. The norm Contributed Package • Generate some data. X1 <- rnorm(100,6,1) X2 <- rnorm(100,10,3) X3 <- rnorm(100,3,.2) X4 <- rnorm(100,31,2) Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)

  27. The norm Contributed Package • Generate some missing data. X1a <- ifelse(runif(100,0,1)<.1,NA,X1) X2a <- ifelse(runif(100,0,1)<.1,NA,X2) • Put the data together. YX <- cbind(Y,X1a,X2a,X3,X4)

  28. The norm Contributed Package • Prep the data and parameters for multiple imputation. #do preliminary manipulations s <- prelim.norm(YX) #find the mle thetahat <- em.norm(s) #set random number generator seed rngseed(1234567)

  29. The norm Contributed Package • Create a list to store the individual results in. betaout <- vector("list",10) betasterrout <- vector("list",10)

  30. The norm Contributed Package • Run a multiple imputation loop for(i in 1:10){ ximp <- imp.norm(s,thetahat,YX) beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients betaout[[i]] <- beta1 betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2] }

  31. The norm Contributed Package • Analyze the results mi.inference(betaout,betasterrout,confidence=0.95)

  32. The norm Contributed Package • Look at the output (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060 $std.err (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 2.70312542 0.13431178 0.04240159 0.65908509 0.05596610 $df (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1318.8371 222.2528 13269.2373 1770.6680 27689.4900 $signif (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01 $r (Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5] 0.09004737 0.25192843 0.02673983 0.07676697 0.01835967

  33. The lpSolve Package • The lpSolve package allows for the solving of linear and integer programs. library(lpSolve)

  34. The lpSolve Package • Consider the following linear program:

  35. The lpSolve Package • Set up the vectors and matrices f.obj <- c(1, 9, 3) f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE) f.dir <- c("<=", "<=") f.rhs <- c(9, 15)

  36. The lpSolve Package • The lp() function will attempt to solve the linear program. lp ("max", f.obj, f.con, f.dir, f.rhs) Success: the objective function is 40.5

  37. The lpSolve Package • To obtain the solution grab the solution from the object. lp("max", f.obj, f.con, f.dir, f.rhs)$solution [1] 0.0 4.5 0.0

  38. The lpSolve Package • Sensitivity analyses can be obtained from the lp() object. • The following are objects attached to an lp() object. [1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval" [9] "solution" "presolve" "compute.sens" "sens.coef.from" [13] "sens.coef.to" "duals" "duals.from" "duals.to" [17] "status"

  39. The lpSolve Package • To solve an integer program specify the vector components for which variables need to be integers lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) Success: the objective function is 37

  40. To obtain the solution to the integer program use the solution statemet as before: lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution [1] 1 4 0 The lpSolve Package

  41. Summary • R is programming environment with many standard programming structures already included. • A large number of contributed packages. • Many packages allow for use of modern statistical procedures with out having to code them yourself. • Requires familiarity with R to actually implement the packages. • No support. • Allows users to create new packages.

  42. Summary • All of the R code and files can be found at: www.people.vcu.edu/~elboone2/CSS.htm

More Related