1 / 27

R Introduction and Training

R Introduction and Training. Patrick Gurian, Drexel University CAMRA 1st QMRA Summer Institute August 7, 2006. Objective. Learn maximum likelihood estimation-a method of fitting statistical models Learn the R statistical programming language All in 2 hours!

Télécharger la présentation

R Introduction and Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Introduction and Training Patrick Gurian, Drexel University CAMRA 1st QMRA Summer InstituteAugust 7, 2006

  2. Objective • Learn maximum likelihood estimation-a method of fitting statistical models • Learn the R statistical programming language • All in 2 hours! • Idea is to make you aware of techniques and tools

  3. Statistical Model Estimation • Statistical models all contain parameters • Parameters are constants that can be “tuned” to make a general class of models applicable to a specific dataset

  4. Example: Normal distribution PDF is: f(x) = 1/{σ(2π)1/2} exp{(x-µ)2/ 2σ2} µ and σ are constants that define a particular normal distribution We want to pick values that match a given dataset One simple way is arithmetic mean=µ s=σ (method of moments) But there are other ways including…

  5. Maximum likelihood estimation • Assume a probability model • Calculate the probability (likelihood) or obtaining the observed data • Now adjust parameter values until you find the values that maximize the probability of obtaining the observed data

  6. Likelihood Function • Observe data x, a vector of values • Assume some pdf f(x|θ) • Where θ is a vector of model parameters • Probability of any particular value is f(xi| θ) where i is an index indicating a particular observation in the data

  7. Likelihood of the data • Generally assume data is independent and identically distributed • Same f(x) for all data • For independent data: prob [Event A ∩ Event B] = prob[A] Prob[B] • So multiple probability of individual observations to get joint probability of data • L=π f(xi| θ) • Now find θ that maximizes L

  8. MLE example • A team plays 3 games: W L L • Binomial model: what is p? • L=(3:1)p(1-p)2 • Suppose we know sequence of wins and losses then we can say • L=p(1-p)2

  9. MLE example (cont) • Suppose p=.5 • L=0.5^2*0.5=0.125

  10. Example for class • Now suppose p=.3 • What is likelihood of data? • Do we prefer p=0.5 or p=0.3?

  11. Answer • L=0.3*0.7*0.7=0.147 • Likelihood of data is greater with p=0.3 than with p=0.5 • We prefer p=0.3 as a parameter estimate to 0.5

  12. Example: Maximizing the likelihood dL/dp= 3(1-p)2 - 6p(1-p) Find maximum at dL/dp=0 0=3(1-p)2 - 6p(1-p) 0=1-p -2p 3p=1 p=1/3

  13. MLE example: Conclusion Can verify that this is a maxima by looking at second derivative Note that method of moments would give us p=x/n=1/3 So we get the same result by both methods

  14. Ln Likelihood • Product of many numbers each <1 is quite small • Often easiest to work with ln (L) • Since ln is a monotonic transformation of L, the largest ln (L) will correspond to the largest L value Ln L=ln π f(xi| θ) Applying log laws Ln L=Σ lnf(xi| θ)

  15. Parameter uncertainty in MLE • MLE gives us a general way to estimate parameters of many types of models • But how do we make inferences about these parameters? • There is a general method for large sample sizes • Huge advantage of MLE approach

  16. Information Matrix I = {d2ln(L) / dθ12d2ln(L)/dθ1dθ2 …. d2ln(L)/dθ1dθ2 d2ln(L)/dθ22 ….. … d2ln(L)/dθ1dθk … ….d2ln(L)/dθk2 } k = number of parameters All the second derivatives of ln(L) with respect to parameters

  17. Interpreting I • Note that at MLE dL/ dθi =0 for all i • L(θ^) is at a maximum or peak • Large second derivative indicates a sharp peak • Parameter value of θ^ + Δ is much less likely than θ^ • Small second derivative indicates a gradual slope • Parameter value of θ^ + Δ is almost as likely as θ^

  18. Uncertainty of MLE parameters θ^~N(θ, I-1) MLE parameter estimates are asymptotically normal and unbiased with variance given by the inverse of the information matrix Once you have the variance of θ^i use a Z distribution to test hypotheses about θi and set CI for true value of θi

  19. MLE in practice • Usually work with log likelihood • Often work with models for which it is not tractable develop analytical solutions • Instead use numerical analysis to identify maximum likelihood through gradient search methods and invert I to get standard errors of parameters • Need a software tool…

  20. R language • Freely available software • http://www.r-project.org/ • R is a programming language, not a set of pre-programmed routines with drop down menus • It operates from a command line prompt and has exact syntax requirements • Large library of functions developed by users • Develop your own code specific to your needs

  21. R Resources • An Introduction to R • available from the help menu • Searchable help menu • From command line: help(<function>) • Simple R • R for Beginners • R for Dummies-no such luck 

  22. Strategies for dealing with R • Use another software for standard analyses (JMP, SPSS, etc.) • Have your favorite text editor open when you are working, if you actually succeed in getting a command to work, save it for posterity! • Remember R offers incredible flexibility and speed • Remember you’re not the only one R is driving up a tree

  23. Your assignment • Data on Cryptosporidium oocyst concentrations from Table 6-5, page 176 of Haas, Rose, and Gerba

  24. Assignment (cont) • Assume data follows a Poisson distribution • Use MLE to fit parameter estimates to this data • For practice use numerical approach even thought closed form solution exists for this case • Repeat with negative binomial distribution

  25. Poisson Distribution • Independently occurring discrete random data • Often used for microbes • Assumes they don’t systematically cluster or spread out • Think of each glass of water as a independent trial that may or may not have a bug in it • Think of each sip from each glass of water as an independent trial that may or may not have a bug in it.

  26. Negative Binomial Distribution • Another distribution to describe the number of occurrences of discrete random events • Negative binomial has two parameters which allows for more flexibility, ability to fit more complex data

  27. Next Session • We will talk about comparing these two models • Expand on uncertainty analysis • For now learn how to fit them

More Related