R Introduction and Training

R Introduction and Training Patrick Gurian, Drexel University CAMRA 1st QMRA Summer InstituteAugust 7, 2006

Objective • Learn maximum likelihood estimation-a method of fitting statistical models • Learn the R statistical programming language • All in 2 hours! • Idea is to make you aware of techniques and tools

Statistical Model Estimation • Statistical models all contain parameters • Parameters are constants that can be “tuned” to make a general class of models applicable to a specific dataset

Example: Normal distribution PDF is: f(x) = 1/{σ(2π)1/2} exp{(x-µ)2/ 2σ2} µ and σ are constants that define a particular normal distribution We want to pick values that match a given dataset One simple way is arithmetic mean=µ s=σ (method of moments) But there are other ways including…

Maximum likelihood estimation • Assume a probability model • Calculate the probability (likelihood) or obtaining the observed data • Now adjust parameter values until you find the values that maximize the probability of obtaining the observed data

Likelihood Function • Observe data x, a vector of values • Assume some pdf f(x|θ) • Where θ is a vector of model parameters • Probability of any particular value is f(xi| θ) where i is an index indicating a particular observation in the data

Likelihood of the data • Generally assume data is independent and identically distributed • Same f(x) for all data • For independent data: prob [Event A ∩ Event B] = prob[A] Prob[B] • So multiple probability of individual observations to get joint probability of data • L=π f(xi| θ) • Now find θ that maximizes L

MLE example • A team plays 3 games: W L L • Binomial model: what is p? • L=(3:1)p(1-p)2 • Suppose we know sequence of wins and losses then we can say • L=p(1-p)2

MLE example (cont) • Suppose p=.5 • L=0.5^2*0.5=0.125

Example for class • Now suppose p=.3 • What is likelihood of data? • Do we prefer p=0.5 or p=0.3?

Answer • L=0.3*0.7*0.7=0.147 • Likelihood of data is greater with p=0.3 than with p=0.5 • We prefer p=0.3 as a parameter estimate to 0.5

Example: Maximizing the likelihood dL/dp= 3(1-p)2 - 6p(1-p) Find maximum at dL/dp=0 0=3(1-p)2 - 6p(1-p) 0=1-p -2p 3p=1 p=1/3

MLE example: Conclusion Can verify that this is a maxima by looking at second derivative Note that method of moments would give us p=x/n=1/3 So we get the same result by both methods

Ln Likelihood • Product of many numbers each <1 is quite small • Often easiest to work with ln (L) • Since ln is a monotonic transformation of L, the largest ln (L) will correspond to the largest L value Ln L=ln π f(xi| θ) Applying log laws Ln L=Σ lnf(xi| θ)

Parameter uncertainty in MLE • MLE gives us a general way to estimate parameters of many types of models • But how do we make inferences about these parameters? • There is a general method for large sample sizes • Huge advantage of MLE approach

Information Matrix I = {d2ln(L) / dθ12d2ln(L)/dθ1dθ2 …. d2ln(L)/dθ1dθ2 d2ln(L)/dθ22 ….. … d2ln(L)/dθ1dθk … ….d2ln(L)/dθk2 } k = number of parameters All the second derivatives of ln(L) with respect to parameters

Interpreting I • Note that at MLE dL/ dθi =0 for all i • L(θ^) is at a maximum or peak • Large second derivative indicates a sharp peak • Parameter value of θ^ + Δ is much less likely than θ^ • Small second derivative indicates a gradual slope • Parameter value of θ^ + Δ is almost as likely as θ^

Uncertainty of MLE parameters θ^~N(θ, I-1) MLE parameter estimates are asymptotically normal and unbiased with variance given by the inverse of the information matrix Once you have the variance of θ^i use a Z distribution to test hypotheses about θi and set CI for true value of θi

MLE in practice • Usually work with log likelihood • Often work with models for which it is not tractable develop analytical solutions • Instead use numerical analysis to identify maximum likelihood through gradient search methods and invert I to get standard errors of parameters • Need a software tool…

R language • Freely available software • http://www.r-project.org/ • R is a programming language, not a set of pre-programmed routines with drop down menus • It operates from a command line prompt and has exact syntax requirements • Large library of functions developed by users • Develop your own code specific to your needs

R Resources • An Introduction to R • available from the help menu • Searchable help menu • From command line: help(<function>) • Simple R • R for Beginners • R for Dummies-no such luck 

Strategies for dealing with R • Use another software for standard analyses (JMP, SPSS, etc.) • Have your favorite text editor open when you are working, if you actually succeed in getting a command to work, save it for posterity! • Remember R offers incredible flexibility and speed • Remember you’re not the only one R is driving up a tree

Your assignment • Data on Cryptosporidium oocyst concentrations from Table 6-5, page 176 of Haas, Rose, and Gerba

Assignment (cont) • Assume data follows a Poisson distribution • Use MLE to fit parameter estimates to this data • For practice use numerical approach even thought closed form solution exists for this case • Repeat with negative binomial distribution

Poisson Distribution • Independently occurring discrete random data • Often used for microbes • Assumes they don’t systematically cluster or spread out • Think of each glass of water as a independent trial that may or may not have a bug in it • Think of each sip from each glass of water as an independent trial that may or may not have a bug in it.

Negative Binomial Distribution • Another distribution to describe the number of occurrences of discrete random events • Negative binomial has two parameters which allows for more flexibility, ability to fit more complex data

Next Session • We will talk about comparing these two models • Expand on uncertainty analysis • For now learn how to fit them

R Introduction and Training

R Introduction and Training

Presentation Transcript

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction to R

Introduction: Training and UAS?

Introduction and Training

R-Chip Introduction

Introduction to R

Introduction to Astrostatistics and R

Introduction to R

Introduction to R

R ule and Introduction

Introduction to R

Introduction and Training

Introduction to R

R Training | R Programming Language For Beginners | R Programming Training | Edureka

Introduction to R and Bioconductor