1 / 28

Statistical package 4 th generation programming language

R. Statistical package 4 th generation programming language extensible through functions and extensions environment for statistical computing and graphics statistical and graphical techniques e xtensible through packages we will learn to work with both these tools the line editor

asis
Télécharger la présentation

Statistical package 4 th generation programming language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Statistical package 4th generation programming language extensible through functions and extensions environment for statistical computing and graphics statistical and graphical techniques extensible through packages we will learn to work with both these tools the line editor the graphical interface R commander Competitors: SPSS, Matlab 1

  2. To install R commander Packages Install Package(s)...  CRAN Mirror  Rcmdr wait for installation of Rcmdr and additional packages To load R commander Packages Load Package...  Rcmdr to warning on missing packages answer Yes answer to download them from CRAN Installing R commander 2

  3. Whenever you want to run it Packages Load Package...  Rcmdr File Change Working directory R commander has problems navigating through your directories’ tree Choose an easy-to-find directory, such as your Desktop or the place where you keep your R exercises. Commands getwd() setwd("path with double backslashes") help(command) or ?command q() Running R commander 3

  4. R commander windows script, contains the written instructions R commander  File  Save Script as… output, contains the output R commander  File  Save Output as… pasting them into a text file Workspace contains the data structure File  Save Workspace… R commander  File  Save R workspace As… save.image("path with double backslashes") File  Load Workspace… load("path with double backslashes") Files to save 4

  5. Variable and vector numeric, logical, character Nominal variables and vectors: factor Ordinal variables and vectors: ordered Dataset: data.frame Time series: time.series ls(), ls.str() object, print(object), str(object), rm(object) Objects 5

  6. Variable assignment variable <- value or formula, value or formula -> variable + - * / ** == != < > <= >= & | ! Vector vector <- c(list of values or other vectors) using c() or paste() to concatenate values vector of consecutive numbers: vector<-start:end vector[index] to access a single vector’s element index, sequence, negative sequence, condition variable and vector 6

  7. NA: means "not available", whenever datum is missing NAN: means "not a number", whenever calculation cannot be done for this vector’s element or dataset’s case Inf: means "infinity", result of /0 or log 0 ! is.na(variable) returns TRUE for NA and NAN is.na(vector) returns a logical vector it can be used to remove missing values from a vector as: vecWithoutMissing<- vecWithMissing[ !is.na(vecWithMissing) ] Missing values 7

  8. for loop to repeat some statements for (jin start:end) {statements separated by semicolon} jin this case is an integer variable for loop to scan vector’s elements for(variablein vector) {statements separated by semicolon which use variable} for(j in start:end) {statements separated by semicolon which vector[j]} vector's length is length(vector) for loop 8

  9. function's name <- function(arguments) {statements separated by semicolon;return(object) } square <- function(x) {y<-x*x; return(y)} interest <- function(c,i=0.05,t=1){y<-c*(1+i)**t; return(y)} Usage: function name(arguments) square(5) returns 25 interest(100) returns 105 interest(100,0.1,2) returns 121 function 9

  10. if(condition){statements separated by semicolon} else {statements} curled parentheses are optional when statement is only one else is optional if(a+3<=5){b<-7;c<-9} else b<-2 if(a==2 & b<c) print(b) it is typically used inside functions or for loops if control 10

  11. It builds nominal/ordinal variables R commander  Data  Manage variables in active data set  Convert numeric variables to factor factor(vector , labels=array of labels) newfact <- factor(vector) ordered(vector) newfact <- ordered(vect) newfact<- ordered(vect, labels=c('s','m','l','xl')) levels(factor) factor and ordered factor 11

  12. database table suited for statistical analysis unfortunately its vectors are called variables case names are optional data.frame or dataset 12

  13. vectors are accessed via dataset$vector or attach(dataset) and then use directly vector print(dataset) library(relimp) thenshowData(dataset) fix(dataset) does not work if you have dates in your dataset data.frame 13

  14. dataset <- data.frame(vecnew=vector, …, row.names=col) vecnewis the new name that vector will have in the dataset col is the column number or vector’s name containing cases’ names character vectors are automatically converted to factors Creating a data.frame from vectors 14

  15. R commander  Data  Data in packages data() help(dataset) data(dataset, package="package") Importing data.frame from packages 15

  16. R commander  Data  Import Data  from text file, clipboard or URL… dataset <- read.table("file path or URL", header=TRUE|FALSE, sep="separator", col.names=headers vector, na.strings="value for NA", dec=", or .") Importing data.frame from text files 16

  17. written here just in case you'll ever need it; better and easier converting to text file! R commander  Data  Import Data  from SPSS data set… value labels or factors library(foreign) dataset<- read.spss("file path or URL", use.value.labels = TRUE or FALSE, to.data.frame = TRUE) date importing is wrong! Fix it with library(chron) var<- as.chron(ISOdate(1582, 10, 14) + var)  from Excel, Access or dBase data set… library(gdata) (probably package gdataneeds to be installed) dataset<- read.xls("file path or URL", sheet=sheet number, na.strings="value for NA") Importing data.frame from databases 17

  18. R commander  Data  Active data set  Export active data set… write.table(dataset, "file path", sep="separator", col.names=TRUE or FALSE, row.names=TRUE or FALSE, quote=FALSE, na="value for NA") Exporting data.frame to text file 18

  19. database table with a time unit attached, suited for econometrics analysis time series <- ts(d, start=s, end=e, frequency=f) d is a data.frame or vector or matrix non numeric values are converted s is the time of the first datum; a number or a two elements vector to indicate unit-subunit e is the time of the last datum; same as s f is the number of observations per time unit mytimeseries<- ts(c(0,3,1,1,8,0,3,2,2,2), frequency = 4, start = c(1959, 2)) Data from 2nd quarter of 1959 to 3rd quarter of 1962 mytimeseries <- ts(c(0,1,3,8,1,0,3,2,2,2), frequency = 7, start = c(12, 3)) Data from 3rd day of week 12 to 5th day of week 13 plot.ts(time series) time.series 19

  20. R commander  Data  Active Data Set  Subset active data set… newdataset<- subset(dataset, condition) Usedtorestrictdatasettosomecases  Remove cases with missing data… newdataset<- na.omit(dataset) Modify data set 20

  21. Used to create or modify factor/ordered vectors R commander  Data  Manage variables in active data set  Recode variables… newfactor<-Recode(vector or factor, 'changes separated by semicolon', as.factor.result=TRUE) "Bolzano"="here" c("Munich","Hannover",“Bonn") = "Germany“ Do not use "Munich","Hannover",“Bonn" = "Germany” as suggest by help else="Others" For numerical vectors we may use also 8:27= "high" together with lo and hi Massive recoding Recode 21

  22. Used to create new vector through math operations R commander  Data  Manage variables in active data set  Compute new variable… newvector<-with(dataset, formula) CO2$myname <- with(CO2, uptake*7-sqrt(conc) ) it is identical to CO2$myname <- CO2$uptake*7-sqrt(CO2$conc) Compute 22

  23. Used to change a vector's values based on a condition No R commandermenu newvector<-replace(vector, condition, value) to set to a fixed value use grade2 <- replace(grade, grade < 18, 18) to set to variable values taken from a vector use grade2 <- replace(grade , attended==1 , grade[attended==1]+2) Warning: if you use a vector in the value, you must repeat the condition! Replace 23

  24. Used to group scale vectors into ordered (but it produces factor) R commander  Data  Manage variables in active data set  Bin numeric variable… newfactor<- bin.var(vector, bins=number of bins, method=binning method, labels=see below) method=‘intervals’ means same length intervals method=‘proportions’ means same count intervals method=‘natural’ means using K-means clustering labels=FALSE means using consecutive numbers labels=NULL means using ranges such as (27.2;35.8] labels=vector uses vector’s elements as labels varbinned<- bin.var(myvar, bins=6, method='proportions', labels=c(‘XS',‘S',‘M',‘L',‘XL',‘XXL')) Binning 24

  25. R commander  Graphs  Color palette… Bar  Bar graph… barplot(table(factor), xlab="x label", ylab="y label") Pie  Pie chart… pie(table(factor), labels=levels(factor), main="title", col=rainbow_hcl(length(levels(factor))) ) option col=c(vector of palette) to change the colors Graphs for one nominal variable 25

  26. R commander  Graphs Plot all values case by case  Index plot… plot(vector, type="h or p", col="color") Histogram  Histogram… Hist(vector, breaks=number of intervals, col="color") Boxplot  Boxplot… Boxplot( ~ vector, id.method="y or none" , col="color") Graphs for one scale variable 26

  27. R commander  Graphs Boxplot (scale variable versus nominal variable)  Boxplot…  Plot by groups… Boxplot(vector ~ factor, id.method="y or none") Scatterplot (two scale variables)  Scatterplot… scatterplot(vector1~vector2, reg.line=FALSE or lm, smooth=FALSE, spread=FALSE, boxplots=FALSE, log="nothing or x or y or xy", grid=TRUE) Mathematical graph (two scale variable, first in order)  Line graph… matplot(vector1, dataset[, c("vector2")], type=“l", ylab="vertical label", pch=1) matplot(vector1, vector2, type=“l", ylab="vertical label", pch=1) Graphs for two variables 27

  28. R commander  Graphs Scatterplot (two scale variables and one nominal)  Scatterplot… scatterplot(vector1~vector2 | factor, reg.line=FALSE or lm, smooth=FALSE, spread=FALSE, boxplots=FALSE, log="nothing or x or y or xy", grid=TRUE) How to export your graphics into Word right-click  copy as bitmap Graphs for three variables 28

More Related