R Programming for Life ScientistsVersion 2.0 Raymond R. Balise, Ph.D. Health Research and Policy Spectrum
Roadmap • What makes R different for the rest? • Setting up R • Types of data • Working with collections of data • Importing and exporting data • Writing functions • Graphics
When to Use R • Shoestring budget • Cutting edge statistics • Developing your own or fine-tuning existing methods • Local expertise
Programming Languages • Procedural languages • C, Fortran, Cobol, Basic • use a model where the logic flows from the top of the page to the bottom with calls to “goto” subroutines as needed • It is hard to encapsulate the code. • Object oriented languages • C++, Visual Basic, JAVA • involves creating “objects” and then operating on them
R is Object Oriented (OO) • You create objects • vector of numbers, a graphic, etc. • You call methods/functions to operate on the objects. • Working with an OO language requires you to learn about special methods to create, access, modify, or destroy objects and their properties. • R hides these processes. • It helps a lot if you want to write new statistics and methods and is required for making new packages.
OO Example • With R you write code in the editor which I will show you in a minute. • You can create an object which holds a bunch of numbers (a vector, if you remember math) • You can then use (aka call) a function (aka method) to operate on the object. • The summary() function • Create and display a numeric summary object • The plot() function • Create and display a graphic summary object
Make the ages object Call the summary function. Call the plot function.
But wait … there’s more! • There is a lot of functionality built into R. It ships with libraries that do many different tasks. And you can download more. Activate the map datasets and functions. Map most of the USA.
But hold on…. There is MORE! • You can add options to the function calls to make them do fancy things like color…. • Or you can have one function act on the output of another function…. • And you can save output as objects!
Important Objects • Vectors are lists of numbers. • Dataframes are like database or spreadsheets.
Everything is an Object Every object in R has a mode which determines how much space it uses. R objects can have a classwhich indicates what they are used for. Class: character numeric logical function factor data.frame list matrix lm table … lots more… Mode: character numeric logical function complex raw Holding “simple” data elements Structures to hold data Columns (of same length) with data like database Columns of data (of different length) Grid of data (all same type) Output from summary/graphic procedures
Where to Get R • R has two main websites. One describes the project: http://www.r-project.org/ • The other has most of the stuff you want to download: http://cran.r-project.org/ • Because the R project has people working all over the globe, the software download site is “mirrored” everywhere. The closest mirror is USA CA1 (aka UC Berkeley).
http://cran.cnr.berkeley.edu/ • There is an R installer for all the common operating systems: • cran.cnr.berkeley.edu/bin/windows/base/ • cran.cnr.berkeley.edu/bin/macosx/ • cran.cnr.berkeley.edu/bin/linux/ • Each is basically self explanatory.
Installing on Windows • Double click the installer and just push next until you get to this screen. Specify that you want to do customized startup. This will let you set up R to work with other programs nicely.
Customize • Use these options, then hit Next> a bunch.
help.start() and push enter to start the help. • q() and push enter to quit but don’t yet.
Use the built in editor. GUI Save or restore all the objects in use. Save or reload the code from the console. Set the working directory to save objects. Keep all the text in the console for the session.
GUI Edit existing data. Tweak the appearance of the console.
Rprofile.site • If you have instructions that you always want run when R starts up, you can include them in the Rprofile.site file:
GUI Common commands. Show the add on packages currently accessible.
Packages in R • User-supplied packages are typically found at one of three places: • CRAN for all kinds of stuff • Omegahat for web-based statistics • Bioconductor for genomic analysis • R packages update often. • Your colleagues will recommend task-specific packages. • Rcmdr is my favorite.
Use a previously downloaded package. I type library(name) instead. GUI USA (CA1) is closest to Stanford. Choose which set of packages to look at. See the HUGE list of packages. Update often!
GUI This is useful.
HTML help… This will not find information if you have not installed the packages. This is useful but not Google.
Rseek.org is Google-driven • I highly recommend it.
Search help for the word "map". Mac Quick Help Search for details on a function if the package is loaded and you know the functions name.
Search help for the word "map". Windows Quick Help Load the package. Search help for the function named "map".
Mac Install • Download and double click the dmg file. Click customize and make sure Tcl/Tk is checked on.
X11 • Some packages for R on the Mac (like Rcmdr) require X11 to be installed. • I think it is part of the standard Leopard installation but was an option with Tiger. If you need it, try to install it off of the DVD that came with your machine because people have reported using the dmg files from Apple.com.
X11 and Add-on Packages To get add on packages, use this menu. You can click here to make sure X11 works.
Getting or Updating Packages • Click Get List, click the package name, be sure install dependencies is checked on, then click install.
Instead of Point and Click • You can also run this code to have Mac or Windows R download a list of packages: usefulPackages = c("car", "foreign", "hexbin", "gdata", "ggplot2", "gmodels", "gplots", "Hmisc", "reshape", "Rcmdr") install.packages(usefulPackages, dependencies = TRUE) Be sure to take note of any packages that do not install. ‘marray’, ‘affy’, ‘Biobase’, ‘Rgraphviz’ were not available
Your First Package • I suggest you install the Rcmdr package first thing. • Use the Install packages… option on the package menu to download Rcmdr • To make it available for your R session type: library(Rcmdr) • Capitalization matters! • The first time you run it, it will ask you if it can download additional packages.
On a Mac you can not directly import from Excel. If you are on Windows you can directly import Excel.
Hate Typing? • Tab is your fiend. It will auto-complete if it can or give you a list of functions that match what you have typed. It woks very well on the Mac. In Windows sometimes you need to type tab twice. • In Windows if you type tab after a ( it displays options for the function or they just appear in the Mac.
Before Analysis • Rcmdr makes the most commonly done analyses easy or if you are told the names of the functions to use, writing the code is almost tolerable. • Data manipulation is relatively difficult in R compared to other analysis and data management tools. • You need to know how to manipulate the data objects.
Data Set Objects • Vectors • A bunch of data in a single row or column • All of the same type • Matrix • A row and column arrangement of data • All of the same type • Data frame • A row and column arrangement of data • Columns are of different types • List • Very free-form structure • A grouping of different types of data Like a “good” spreadsheet or relational database file
Types of Data Vectors • Numeric • Integer, real, and complex are different types but you will not need to pay attention to the details • NA means missing • NAN means not a number • String • Characters of the alphabet • Logical • TRUE, FALSE or NA
Making a Vector R is case sensitive. # means ignore the rest of the line. ; means a new command follows. • Make a sequence OneToThirty = seq(1, 30) OneToThirty # same as print(OneToThirty) oneToThirty = seq(1, 30, by = 2); oneToThirty x1230 = 1:30 (x1to30 = 1:30) This will be VERY useful. Surround the expression with () to display the result automatically.
Making Vectors With c() • c stands for concatenate ages = c(9, 11, 40, 41) ; ages stooges = c("Larry", "Moe", "Curly", "Shemp"); stooges
Getting Details • You can use is functions and length to get details on a vector. is.vector(ages) is.numeric(ages) is.logical(ages) length(ages)
Recycling and Vectorizing • You can add one to all four ages. ages + c(1,1,1,1) • If you provide the scalar integer, R will temporarily vectorize the 1 by recycling that value to match the length of the ages vector. ages + 1 • It will recycle a series also. ages ages + c(1,2)
Naming Parts of a Vector • You can assign names to the elements of a vector. This allows later access to the elements using the names instead of the position. names(ages) = stooges ages • To erase them: names(ages) = NULL; ages • Notice what happens when the lengths differ: stooges= c("Larry", "Moe", "Curly") names(ages) = stooges ages
Attributes • When you add names to things (objects) they acquire or change their “names attribute.” attributes(ages) • When you strip off the names, the vector is left with no attributes. names(ages) = NULL attributes(ages)
Complex Objects • A data frame is an object with many attributes. • R ships with a lot of datasets if you want one… help.start() • Click packages then datasets. esoph ?esoph attributes(esoph)
Getting at Parts of a Vector • Specify the element number. heyMoe = ages ; heyMoe • Specify to drop everything except the element number. ages[c(-1, -3, -4)] • Specify a list with TRUE and FALSE ages[c(FALSE, TRUE, FALSE, FALSE)]
Getting Parts with Names ages = c(9, 11, 40, 41) ; ages names(ages) = c("Larry", "Moe", "Curly", "Shemp") ages • Specify the name. heyMoe = ages["Moe"]
Duplicate Names • That code only returns the first one if there are duplicates. names(ages) = "Moe" ages heyMoe = ages["Moe"] heyMoe • Gives all if duplicates names(ages) %in% "Moe" ages[names(ages) %in% "Moe"]