90 likes | 195 Vues
This hands-on guide introduces learners to essential R programming techniques, focusing on data organization, manipulation, and visualization. Topics include folder structures for datasets, reading and listing data, performing arithmetic operations, creating various types of plots (histograms, scatterplots, box plots), and developing custom functions. It emphasizes best coding practices, such as using informative variable names and modular function design. Ideal for beginners seeking a structured approach to mastering R programming and data analysis.
E N D
Organization of Folders: • Class Data folder has datasets (end in .csvor .rda) • Rcode has the scripts of R commands that can cut and pasted into an R window (end in .R) • Ppt has the powerpoints
The Basics Rintro.R • Reading in comma separated data • Listing a data set or its components • Subscripts • Sequences and combing values (: and “c”) • Creating a new data set with <- • Arithmetic in R • Applying some stats to data • Saving your work
Fooling with the data RintroManipulating.R • Subscripting to get a subset of data • Working with rows and columns • Arithmetic on a column at once Pay attention to where the commas are! e.g. BT2[3,5] BT2[3,] BT2[,5] BT2[,3:5] or BT2[1:10,3:5] These are all different!
Plotting DataRintroPlotting.R • Histogram hist • Adding more features to the plot • Scatterplots using plot • Box plots for several data sets boxplot • Adding text to a scatter plot • Changing the axes scales Build a complicated plot by adding features through several simple steps
Writing functionsRintroFunctions.R • The uses for { } and ( ) • What goes in and what comes out • Listing a function • Optional arguments • Calling a function and assigning its results to a new data set • Review of R arithmetic
More on R programmingRintroProgramming.R • Changing the data type • Looping the for block • if statements and logicals • lists • The apply function
Best Practices in coding • Use informative names for important variables • Comment steps that are not obvious • Break R expressions into several steps for clarity • Many smaller and simple functions are good– avoid functions more than about 50 lines. • Simple is good – overly clever is a good way to introduce bugs! • Add default values and test that will prevent data analysis errors.
The interquartile range function # finds the interquartile range of a vector MyIQR<- function( y,na.rm=FALSE){ if( !is.vector(y) ){ stop(‘data is not a vector’)} # omits NAs if na.rm is TRUE Q25<- quantile( y, .25, na.rm=na.rm) Q75<- quantile( y, .75, na.rm=na.rm) IQR<- Q75 – Q25 return(IQR) }