160 likes | 320 Vues
This guide serves as an introduction to R, a powerful statistical programming and graphics language developed from the S Programming Language. R's advantages include its open-source nature, diverse ecosystem of nearly 2,400 packages, and active user community. It is widely used in the statistical research community and offers functionality across various fields such as science, business, and economics. The guide also addresses potential limitations of R and provides essential coding practices, making it suitable for both beginners and experienced users seeking to enhance their data analysis skills.
E N D
Introduction to R Tara Jensen National Center for Atmospheric Research Boulder, Colorado USA jensen@ucar.edu
R Exercises • Find sample data and R scripts at: • ftp://ftp.ncmrwf.gov.in/pub/outgoing/raghu/6WVMW/Tutorial/Day1/R-tutorial • Download to directory on your computer • Start R • Open intro2R.2014wmo.R
What is R? • A statistical programming and graphics language • In part, developed from the S Programming Language from Bell Labs (John Chambers) • Created to: • Allow rapid development of methods for use in different types of data. • Require small amounts of system resources
Why R? • R ~ the dominant language in the statistical research community. • R is Open Source and free. • Runs on most operating systems • Nearly 2,400 packages contributed. • Packages and applications in nearly every field of science, business and economics. • See R Notes, R Journal and Journal of Statistical Software. www.jstatsoft.org • More than 100 books with accompanying code • Very large, active user base. • Many default parameters are chosen, but users retain complete control.
Why not R? • NCL, IDL, Matlab, SAS, … are all viable alternatives to R. If you are a part of an active community of researchers using another language, do likewise. • R may be limited by memory. For verification of large gridded datasets – consider using Model Evaluation Tools (MET) • R is does not produce a compiled executable so may not bedesirableto some operationalcenters
The R Community • Developers • R Core Group (20 members), only 2 have left since 1997 • Major update in April/October (freeze dates, beta versions, bug tracking, ...) • Mailing lists • Help list ~ 150 messages/day, archived, searchable. • http://www.r-project.org/mail.html • 5 International Conferences, 2 US, 1 China
Everything about R is at www.r-project.org • Source code • Binary compilations (Windows, Mac OS, Linux • Documentation ( Main documents, plus numerous contributed. Some in foreign languages.) • Newsletter (replaced by R Journal.) • Mailing list (Several search engines) • Packages on every topic imaginable • Wiki with examples • Reference list of books using R. ( more than 100) • Task Manager
Use R with scripts • In Linux - Emacs Speaks Statistics • Provides syntax-based • Object name completion • Key stroke short cuts • Command history • Alt-x R to invoke R with Xemacs. • In Windows, use editor • Added GUI features • <control> R sends a line or highlighted section into R. • Install package with GUIs • Save graphics by point and click. • Mac OS • Similar to Windows with advantages of system calls.
R Coding principles • Make verification code transparent and easy to read • Comment and document liberally • Archive your code • Share your code • Label and save your data • Share your data
Packages in R • Contributed by people world wide. • Allow scientists or statisticians to push their ideas. • Apply and extend R capabilities to meet the needs of specific communities. • Accompany many statistical textbooks • Accompany applied articles (Adrian Raftery, Doug Nychka, TilmanGneiting, Barbara Casati, Matt Briggs)
101 Windows or Mac Linux R Packages • Mirror must be selected • Packages -> Set CRAN mirror • chooseCRANmirror() • Packages must be installed to call • Packages -> Install Package(s) • install.packages(c("package 1","package 2","package 3", etc.)) • Packages must be loaded (aka called into use) • Packages -> Load Package(s) • library(“package1”) • library(“package2”) etc… • Base packages are installed by default • To see what packages are installed • Packages -> Load Package(s) • installed.packages(.Library, priority="package 1") • To see what packages are installed • remove.packages(package1,package2, lib=file.path("path to library" )
A sample of useful packages • verification • fields (spatial stats) • radiosondes • extRemes • BMA(Bayesian Model Averaging) • BMAensemble • circular • Rsqlite • SpatialVx • Rgis, spatstat (GIS) • ncdf( support for netcdf files ) • rgdal(support for grib1 files) • rNOMADS(support for grib2 files archived by NCEP) • Rcolorbrewer • randomForests
Very useful functions in R • q( ) – allows you to exit R – you will then be asked if you would like to save your workspace • ls( ) – shows you the objects in your workspace • rm( ) – allows you to remove an object • system( ) – allows you to call system command from R • help(package or function) – brings up help page • ?(package or function) – brings up a help page • read.fwf – read fixed width format data • read.table– read text file with delimiters
More useful functions • aggregate - applies a function to groups of data subset by categories. • apply - incredibly efficient in avoiding loops. Applies functions across dimensions of arrays. • %in% - returns logical showing which elements in A are in B. (e.gA%in%B) • table – create contingency table counts. • boot – apply bootstrap function correctly • par – control everything in a graph • pairs – the most under utilized plot – plots a matrix of 4 columns in a 4x4 plot layout • xyplot (in the lattice package) slightly advance graphic techniques
Windows or Mac Linux R Exercises • Find sample data and R scripts at: • ftp://ftp.ncmrwf.gov.in/pub/outgoing/raghu/6WVMW/Tutorial/Day1/R-tutorial • Download to directory on your computer • Start R • Click on on your desktop • type R at command line • Open intro2R.2014wmo.R • Select File -> Open Script -> select intro2R.2014wmo.R • Open in another window using your favorite editory