1 / 35

Intro to R

Intro to R. Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009 . Class Outline. What is R? The R Environment Reading in Data Viewing and Manipulating Data Data Analysis. What is R?.

reuel
Télécharger la présentation

Intro to R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009

  2. Class Outline • What is R? • The R Environment • Reading in Data • Viewing and Manipulating Data • Data Analysis

  3. What is R? R is frequently thought of as another statistics package, like SPSS, Stata or SAS. While many people use R for statistical analysis, R is actually a full programming environment.

  4. What is R? R is completely command-driven. There are very few menu items, so you must use the R language to do anything. Another important distinction between traditional stats packages and R is that R is object-oriented.

  5. Why Use R? • Free! • Extremely flexible • Many additional packages available • Excellent graphics Disadvantages • Steep learning curve • Difficult data entry

  6. Download R Download R: http://cran.r-project.org Available for Linux, MacOS, and Windows

  7. The R Environment A traditional stats program like SPSS or Stata only contains one rectangular dataset at a time. All analysis is done on the current dataset. In contrast, the R environment is like a sandbox. It can contain a large number of different objects.

  8. The R Environment R is also function-driven. The functions act on objects and return objects. Functions themselves are objects, too! Input Arguments (Objects) function works its black-box magic! Output (Objects)

  9. Rectangular Dataset(Excel, SPSS, Stata, SAS)

  10. R Environment (Object-Oriented) Data Frame Function 1 Numeric Value Vector 1 Function 2 String Results Matrix Vector 2

  11. Help Function • help(function name) • help.search(“search term”) • Note: R is case-sensitive! • Try: help(help), ls()

  12. Help Function • Sometimes one help file will contain information for several functions. • Usage: Shows syntax for command and required arguments (input) and any default values for arguments. • Value: the output object of the function

  13. Setting Up Our Data > library(datasets) > mtcars > ?mtcars > write.csv(mtcars, “C:/temp/cars.csv”)

  14. Creating Objects • Assignment operator: = or <- • Objects need to be assigned a name, otherwise they get dumped to main window, not saved to the environment. • c() is a useful function for creating vectors

  15. Reading in Data read.table(filename, ...) > cars = read.csv(C:/temp/cars.csv) I prefer the CSV (comma-separated values) format. Almost every stats program will export to this format.

  16. Viewing Data What does the dataset look like? > str(cars) > colnames(cars) > dim(cars) > nrow(cars) > ncol(cars) You can also assign row/col names with these functions.

  17. Common Mode Types

  18. Common Object Types

  19. Creating Objects

  20. Viewing Data: Indexing datasetname[rownum, columnnum] > cars[1,4] displays value at row 1, column 4 > cars[2:5, 6] displays rows 2-5, column 6

  21. Viewing Data: Indexing > cars[, 2] displays all rows, column 2 > cars[4,] displays row 4, all columns

  22. Viewing Data • You can also access columns (variables) using the ‘$’ symbol if the data frame has column names: > cars$mpg > cars$wt

  23. Manipulating Data • Now we can give that first column (variable) a better name than “X”. > colnames(cars) = c(“name”, colnames(cars)[2:ncol(cars)])

  24. Manipulating Data > str(cars) R has the unfortunate habit of trying to turn vectors of character strings into factors (categorical data). > cars$name = as.character(cars$name)

  25. Manipulating Data: Operators Arithmetic: + - * / ^

  26. Manipulating Data Viewing subsets of data using column names and operators: > cars[cars$vs==1,] > cars[cars$cyl >= 6,] > cars$name[cars$hp > 100] > cars$name[cars$wt > 3]

  27. Analyzing Data What do the variables look like? > table(cars$gear) > hist(cars$qsec) > mean(cars$mpg) > sd(cars$mpg) > cor(cars$mpg, cars$wt) > mean(cars$mpg[cars$cyl == 4])

  28. Manipulating Data Transforming variables: > wt.lb = cars$wt * 1000 This creates a new vector called wt.lb of length 32 (our number of cases).

  29. Manipulating Data We can use wt.lb without “adding” it to our dataframe. But if you like the rectangular dataset concept, you can column bind it to the existing dataframe: > cars = cbind(cars, wt.lb)

  30. Data Analysis Hypothesis Testing t.test, prop.test Regression lm(), glm()

  31. Data Analysis: OLS Regression > regr = lm(cars$mpg ~ wt.lb + cars$hp + cars$cyl) The output of the regression is also an object. We’ve named it regr. > summary(regr)

  32. Saving Data You can use write.csv() or write.table() to save your dataset. When you quit R, it will ask if you want to save the workspace. This includes all the objects you have created, but it does not include the code you’ve written. You can also use save.image() to save the workspace. You should always save your code in a *.r file.

  33. Other Useful Functions > ifelse() > is.na() > match() > merge() > apply() > order() > sort()

  34. Other Resources • Main R website: http://www.r-project.org • UW CSSS Intro to R • UW CSDE Intro to R • UCLA Statistical Computing http://www.ats.ucsla.edu/stat

  35. Advanced Topics • More on factors • Lists (data type) • Loops • String manipulation • Writing your own functions • Graphics

More Related