1 / 65

Introducing

Introducing. What is and what can I do with it?. R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: http://cran.stat.auckland.ac.nz/ What is R? A very simple programming language A place for you to input data

trula
Télécharger la présentation

Introducing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing

  2. What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: http://cran.stat.auckland.ac.nz/ What is R? A very simple programming language A place for you to input data A collection of tools for you to perform calculations A tool for producing graphics A statistics suite that can be downloaded on to any PC, Mac or Linux system A software package that can run on high performance computing clusters

  3. What is and what can I do with it? With R you can: Perform simple or advanced statistical tests and analyses e.g. standard deviation, t-test, principal component analysis Read and manipulate data from existing files e.g. tables in Excel files, trees in nexus files, data on websites Write data or figures to files e.g. export a figure to .pdf, export a .csv file Produce simple or advanced figures

  4. What is and what can I do with it? http://dx.doi.org/10.1098/rspb.2014.0806 Figure 2. A reconstruction of the evolutionary history of carotenoid pigmentation in feathers. The likelihood that ancestors could display carotenoid feather pigments has been reconstructed using ‘hidden’ transition rates in three rate categories (AIC = 4002.5, 11 transition rates) [33]. The POEs (defined in Material and methods) for carotenoid feather pigmentation are identified by red circles. Branches are coloured according to the proportional likelihood of carotenoid-consistent colours at the preceding node. Solid purple points indicate species for which carotenoid feather pigments were confirmed present from chemical analysis; open black points represent those for which where carotenoids were not detected in feathers after chemical analysis. Supertree phylogeny from [21].

  5. Who is this guide for? Starting at ground level and shaping you into a confident R user Are you… Completely new to R? An infrequent R user who wants a refresher? The material in these slides may not be useful for confident R users. An Introduction to R W. N. Venables, D. M. Smith and the R Core Team http://cran.r-project.org/doc/manuals/R-intro.pdf

  6. What does this guide cover? • Part zero: Getting started • Interacting with R • Part one: Objects • Vectors, Matrices, Character arrays • Part two: Data manipulation • Analysing data, T-test • Part three: External data • Reading data into R, ANOVA • Part four: Packages and libraries • Installing new packages into R • Part five: Scripts • Using pre-written code • Part six: Logic (programming) • Other functions in R

  7. Starting This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. For Mac OS, R can be executed from terminal. For Unix, seek professional help… The only point of difference should be the initial starting of R and the visual appearance: Console commands will be the same for all operating systems.

  8. Part zero: Getting started #Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a comment or instruction #Start R by finding the R application on your computer #You will be presented with the R console

  9. Part zero: Getting started #There are a variety of ways of using R, and we will start out with the most basic #We are going to enter lines of code into R by typing or pasting them into the R console #At its most basic, R is just a calculator > 1+1 [1] 2 > 1*3 [1] 3 > 4-7 [1] -3 > 20/4 [1] 5 > #The lines above this have come from the R Console. Remember to remove the >symbol if you copy text directly from these slides and paste it into R

  10. Part zero: Getting started #Some more basic mathematical operations in R > 12--2 [1] 14 > 2^2 [1] 4 > sqrt(9) [1] 3 > 4*(1+2) [1] 12

  11. Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below #Side a has length 3, Side b has length 4, and the hypotenuse has length h h2=a2+b2 h= √(a2+b2) h 3 4

  12. Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below > sqrt(3^2+4^2) [1] 5 h 3 4

  13. Part one: Objects #R is more than just a basic calculator… #Most operations in R will use objects, which are values stored in R #Type x=1 into the R console #You have now input a number into R by storing that number as an object. For this example, the name of our object is x #Objects must be named using letters alone, or letters followed by other symbols #Object names cannot include spaces > x=1 > #Congratulations, you have just programmed R to store an object. #Type x into the R console to recall your object > x [1] 1 >

  14. Part one: Objects #We will now replace the value of x with 10 > x [1] 1 > x=10 > x [1] 10 > #As you can see, the value of an object can be easily replaced by simply making the object equal to a new value

  15. Part one: Objects #Let’s make y into a vector - a one dimensional array #There are several ways of making a vector in R. These methods introduce functions. #A function is an operation performed on numbers and/or objects. #The two easiest ways of making a vector in R use different functions: #Use the concatenate function c and place numbers inside parentheses > y=c(10,11,12,13,14,15,16,17,18,19,20) > y [1] 10 11 12 13 14 15 16 17 18 19 20 #Use thearrayfunction and place numbers inside parentheses > y=array(10:20) > y [1] 10 11 12 13 14 15 16 17 18 19 20

  16. Part one: Objects #Just as we replaced x with a single value, we can also replace a single value within our vector #Let’s replace the fifth number in our vector with 0 > y [1] 10 11 12 13 14 15 16 17 18 19 20 > y[5]=0 > y [1] 10 11 12 13 0 15 16 17 18 19 20 > #Squarebrackets[] placed after a vector willinstruct R thatwe are interested in only a part of the vector. In theexampleabove, we are referring to thefifth position in the vector

  17. Part one: Objects #Try these vector manipulations as well: > y[1]=y[2] > y [1] 11 11 12 13 0 15 16 17 18 19 20 > #Thevalue of thefirst position waschanged to be thesame as thevalue in the second position > y[c(1,3,5)]=5 > y [1] 5 11 5 13 5 15 16 17 18 19 20 > #Thevalues in thefirst, third and fifth positions weremadeequal to 5

  18. Part one: Objects #Onward! We will make a new object, a two-dimensional matrix, and call it z #Our matrix will have ten rows and ten columns, and we will start out by filling all the cells with 0 > z=matrix(0,ncol=10,nrow=10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 >

  19. Part one: Objects #We can replace parts of our matrix, like we did with our vector > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > z[1,3]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 33 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 #Here, the two numbers inside the square brackets are a coordinate for the matrix: first row, third column

  20. Part one: Objects #We can replace an entire row by not providing a column coordinate > z[1,]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 33 33 33 33 33 33 33 33 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > #Likewise, we can replace an entire column > z[,3]=c(1:10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 1 33 33 33 33 33 33 33 [2,] 0 0 2 0 0 0 0 0 0 0 [3,] 0 0 3 0 0 0 0 0 0 0 [4,] 0 0 4 0 0 0 0 0 0 0 [5,] 0 0 5 0 0 0 0 0 0 0 [6,] 0 0 6 0 0 0 0 0 0 0 [7,] 0 0 7 0 0 0 0 0 0 0 [8,] 0 0 8 0 0 0 0 0 0 0 [9,] 0 0 9 0 0 0 0 0 0 0 [10,] 0 0 10 0 0 0 0 0 0 0 >

  21. Part one: Objects #Lastly, we will make a character array, which is like a vector or a matrix except that it can hold numbers and letters > w=matrix("df",ncol=10,nrow=10) > w [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" > #So, this covers the basics of creating objects for storing data in R.

  22. Part one: Objects #Let’s clean out the objects that we made in Part One > ls() [1] "w" "x" "y" "z" > #The list objects command ls()will show us which objects are stored in R #We can permanently remove a specific object with therm() function > rm(x) > ls() [1] "w" "y" "z" > #We can also remove all objects > rm(list = ls()) > ls() > character(0)

  23. Part one: Exercise #Make a new matrix object with three columns and seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. #Replace the fourth row of your matrix with your vector.

  24. Part one: Exercise #Make a new matrix object with three columns, seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. > daniel=matrix(9,ncol=3,nrow=7) > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 9 9 9 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9 #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. > thomas=c(101,898,-3) > thomas [1] 101 898 -3 #Replace the fourth row of your matrix with your vector. > daniel[4,]=thomas > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 101 898 -3 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9

  25. HELP! #You can call on the help function if you become lost or unstuck when using R #Can’t remember how to make a matrix? > ?matrix >

  26. Part two: Data manipulation #This will be a worked example for a Student’s T-test for the means of two samples, showcasing the storage and analysis of data in R

  27. Part two: Data manipulation #Make x a vector containing 1000 random numbers > set.seed(1) > x=rnorm(1000) #Make ya vector containing 1000 random numbers > set.seed(100) > y=rnorm(1000) #The random numbers in R are not truly random, they are simply drawn from a pool of data that has many characteristics of random data. Using the set.seed function, we can define a set of ‘random’ numbers for use in our calculations. This will mean that we should all get the same results from our ‘random’ numbers’ #We will use Student’s T-test to see if the mean of x and mean of y are significantly different

  28. Part two: Data manipulation #What are the assumptions for a T-test? #1) That the two samples (x and y) are each normally distributed #2) That the two samples have the same variance #3) That the two samples are independent #These are calculated data so we will assume that 3) is true. #We should test 1) and 2) if we want our T-test results to be meaningful!

  29. Part two: Data manipulation #We will use the Shapiro-Wilk1 test to see if the data are normally distributed #The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that the data are normal #We would reject the null hypothesis for our sample if we received a p-value of <0.05 #To perform a Shapiro-Wilk test in R we use the shapiro.test function > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9988, p-value = 0.7256 > > shapiro.test(y) Shapiro-Wilk normality test data: y W = 0.9993, p-value = 0.9765 1Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika52: 591–611

  30. Part two: Data manipulation #We will use an F-test1 to see if x and y have equal variances #The null hypothesis of this F-test is that the two datasets have equal variances, and this hypothesis is rejected if the p-value is <0.05 #We calculate an F-test for equal variances in R using the var.testfunction > var.test(x,y) F test to compare two variances data: x and y F = 1.0084, numdf = 999, denomdf = 999, p-value = 0.8947 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.890733 1.141648 sample estimates: ratio of variances 1.008417 1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

  31. Part two: Data manipulation #Are yourxand y normally distributed? (hint… mine are) #Do your x and y have equal variances? (hint… mine do)

  32. Part two: Data manipulation #Let’s perform the Student’s T-test and see if the mean of x and the mean of y are significantly different #We will use a simple form of the t.test function. This test requires three pieces of information: x, y, and information about equal variance > t.test(x,y,var.equal=TRUE) Two Sample t-test data: x and y t = -0.6161, df = 1998, p-value = 0.5379 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.11903134 0.06212487 sample estimates: mean of x mean of y -0.01164814 0.01680509 #The null hypothesis for this test is that x and y have the same mean value. The significance level was set at 0.95, so the rejection criteria would be a p-value less than 0.05. Did we reject the null hypothesis?

  33. Part two: Exercise #Generate vector objects a and b as below > set.seed(10) > a=rnorm(1000,sd=2) > set.seed(50) > b=rnorm(1000,sd=1) #Is the mean of a significantly different from the mean of b? Is it appropriate to use a Student’s T-test to address this question?

  34. Part two: Exercise > shapiro.test(a) Shapiro-Wilk normality test data: a W = 0.9979, p-value = 0.2538 > shapiro.test(b) Shapiro-Wilk normality test data: b W = 0.9978, p-value = 0.2242 > var.test(a,b) F test to compare two variances data: a and b F = 3.7431, numdf = 999, denomdf = 999, p-value < 2.2e-16 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 3.306307 4.237678 sample estimates: ratio of variances 3.743136 > t.test(a,b,var.equal=F) Welch Two Sample t-test data: a and b t = 0.3949, df = 1497.218, p-value = 0.693 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1106290 0.1663946 sample estimates: mean of x mean of y 0.022749483 -0.005133326 > Is the mean of a different from the mean of b? p-value = 0.693 Fail to reject the null hypothesis that the means are different.

  35. Part three: External data #Datasets can often be too large to type into R. This section of the guide will show you how to automatically read data into R and then perform an analysis #For this test we will perform a one-way analysis of variance (ANOVA) #Right click on the dataset embedded above the arrow , move the mouse to ‘Macro- Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma separated values file) to a folder on your computer #The dataset contains a count of endangered species for sixty randomly selected countries in three different regions. These data have been extracted from Table 6a of the IUCN Red List summary statistics: http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf

  36. Part three: External data #We are going to use a one-way ANOVA to see if the mean number of endangered species is different in different regions (AFRICA, ASIA and EUROPE). #First step: we will now tell R where to look for the file, using the setwd()function > setwd("H:/Projects/Teaching/R") #Hint: your working directory will be different to mine #Note: we use forwardslashes/ and not backslashes \ #Second step: we read the file into R as a new object called IUCN. The term sep="," is used because values in the dataset are separated by commas. The term header=T is used because the first row of the IUCN table contains column names > IUCN=read.table("IUCN.csv",sep=",",header=T) #Alternatively, if we know the full file path, then we could read the file into R without using setwd() > IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)

  37. Part three: External data #What are the assumptions for a one-way ANOVA? #1) That the data in each group have been randomly selected from a normal distribution #2) That each group of data have the same variance #3) That each group of data is independent #Assumption 3) may be unlikely but we will assume it is true. #We should test 1) and 2) if we want our ANOVA results to be meaningful!

  38. Part three: External data #We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and EUROPE) and are normally distributed #First though, we will separate out the data for each region so that we can test for normality separately > af=IUCN[which(IUCN[,2]=="AFRICA"),3] #Let’s take a closer look: IUCN[,2]calls up the second column of the IUCN object #The which()function is asking ‘which of the values in column 2 of the IUCNobject contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa row values. #Now we can use the Africa row values to find the number of Endangered species for each African country. These species counts are stored in column 3 of the IUCN object. IUCN[which(IUCN[,2]=="AFRICA"),3] #Now we store the endangered species counts for African countries as the af object af=IUCN[which(IUCN[,2]=="AFRICA"),3]

  39. Part three: External data #Repeat for ASIA and EUROPE > ai=IUCN[which(IUCN[,2]=="ASIA"),3] > eu=IUCN[which(IUCN[,2]=="EUROPE"),3]

  40. Part three: External data #We will use a Bartlett Test of Homogeneity of Variances1to test if variance is equal across our three groups (AFRICA, ASIA, EUROPE). #The function for the Bartlett test is simply Bartlett.test(). The terms for this function will be the Endangered species column of the IUCNobject, and the Regioncolumn of the IUCNobject. Column 3 and column 2 respectively. #A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the groups have equal variances. #We would reject the null hypothesis for our dataset if we received a p-value of <0.05. > bartlett.test(IUCN[,3]~IUCN[,2]) Bartlett test of homogeneity of variances data: IUCN[, 3] by IUCN[, 2] Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988 11Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

  41. Part three: External data #Here we reject the null hypothesis – at least Region has a variance that is not equal to the variance of another Region in the dataset. #Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed however. #The ANOVA test is robust to violations of this second assumption. This means that it can still produce meaningful results even if the groups do not have equal variances. As a rule of thumb, we can proceed if the maximum variance of our groups is less than 4 times greater than the minimum variance of our groups. > var(af) [1] 25.07692 > var(ai) [1] 9.002849 > var(eu) [1] 7.464387 > #The variance of the number of endangered species in Africa is substantially greater than the other two variance values. However, the Africa group variance is less than 4 time the variance of the Europe group > var(eu)<4*var(af) [1] TRUE #So, we will proceed, but we need to be aware that with unequal variances is will be tougher for an analysis of variance to find a significant result.

  42. Part three: External data #Perform the one-way ANOVA using the aov() function with the following syntax, and store the results as an object called IUCN_ANOVA > IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN) #You can see the ANOVA results by calling up the IUCN_ANOVA object > IUCN_ANOVA Call: aov(formula = Endangered_species ~ Region, data = IUCN) Terms: Region Residuals Sum of Squares 703.284 1080.148 Deg. of Freedom 2 78 Residual standard error: 3.721297 Estimated effects may be unbalanced >

  43. Part three: External data #Use the summary() function to find out more about the ANOVA > summary(IUCN_ANOVA) Df Sum Sq Mean Sq F value Pr(>F) Region 2 703.3 351.6 25.39 3.21e-09 *** Residuals 78 1080.1 13.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > #Interpretation: How do we read this table to find out if the mean number of endangered species is different in different regions? #The null hypothesis for this test is that the mean number of endangered species is the same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) is less than the significance level for this test (i.e. <0.05). So, we reject the null hypothesis, and conclude that the mean number of endangered species is significantly different between regions.

  44. Part three: External data #Are the number of endangered animals different between all regions, or just different for one region? To find out we will use Tukey’s Honest Significant Difference test. #The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following syntax > TukeyHSD(IUCN_ANOVA,"Region") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Endangered_species ~ Region, data = IUCN) $Region diff lwrupr p adj ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620 EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000 EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684 #Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with a p adj value <0.05 had a significantly different number of endangered species.

  45. Part three: External data #Bonus: Let’s plot our IUCN data to better visualise these results > boxplot(Endangered_species~Region,data=IUCN) Outlier maximum (excl. outliers) upper quartile mean lower quartile minimum (excl. outliers)

  46. Part Three: Exercise #Plotting basics #To quickly generate a plot in R using only default options, simply use the plot() function. > plot(af) > #There are many variables that you change to improve the look of your plots plot(af,xlab="Country",main="Africa",col=rainbow(100),pch=16,ylab="Endangered species (number)",cex=2,font=6) barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AFRICA"),1],las=2,ylab="Endangered species (count)",main="Africa") #Use ?plot and ?barplot to learn about the variables you can change when plotting data

  47. Part four: Packages and libraries #You have been using some of the basic functions that are packaged with R, and you have been either generating or importing datasets #Anyone can write a new function in R though, or make a dataset, and these functions and datasets can be bundled together into a package #R is modular, which means you can download and install new packages to give you access to new functions and/or datasets #There is an automatic and a manual method for installing packages. This guide will teach you how to manually install packages in R #Why the manual method you ask? Because R requires internet access to download packages, which can be complicated by a University proxy. I can’t guarantee that the proxy won’t be an issue. That’s why. Well that, and it will be good for you.

  48. Part four: Packages and libraries #This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ package, first written by Emmanuel Paradis in 2008 #The abbreviation for this package is ape

  49. Part four: Packages and libraries #Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.html into the address bar – go to the website. The page should be mostly black text on a white background. #Find the Downloads section towards the bottom of the website. #For mac users: download the Mac OS X binary (ape_3.1-4.tgz) #For PC users: download the Windows binary (ape_3.1-4.zip) #For UNIX users: again, seek professional help #Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find #Note to future users: the file name may be slightly different if Paradis has updated ape

  50. Part four: Packages and libraries #Run R #Use the install.packages function with the following syntax to install the ape package > install.packages("H:/Teaching/ape_3.1-4.zip") #Remember to replace my file path “H:/Teaching/” with the file path of the folder where you downloaded the ape package #You should see text like this appear after you enter the install.packages command Installing package into ‘C:/Documents/R/win-library/3.1’ (as ‘lib’ is unspecified) inferring 'repos = NULL' from the file name package ‘ape’ successfully unpacked and MD5 sums checked #Congratulations, you have now added functions and datasets written by Emmanuel Paradis to your own copy of R

More Related