1 / 19

Bayesian inference

Bayesian inference. Gil McVean, Department of Statistics Monday 17 th November 2008. Questions to ask…. What is likelihood-based inference? What is Bayesian inference and why is it different? How do you estimate parameters in a Bayesian framework? How do you choose a suitable prior?

beck
Télécharger la présentation

Bayesian inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian inference Gil McVean, Department of Statistics Monday 17th November 2008

  2. Questions to ask… • What is likelihood-based inference? • What is Bayesian inference and why is it different? • How do you estimate parameters in a Bayesian framework? • How do you choose a suitable prior? • How do you compare models in Bayesian inference?

  3. A recap on likelihood • For any model the maximum information about model parameters is obtained by considering the likelihood function • The likelihood function is proportional to the probability of observing the data given a specified parameter value • The likelihood principle states that all information about the parameters of interest is contained in the likelihood function

  4. An example • Suppose we have data generated from a Poisson distribution. We want to estimate the parameter of the distribution • The probability of observing a particular random variable is • If we have observed a series of iid Poisson RVs we obtain the joint likelihood by multiplying the individual probabilities together

  5. Relative likelihood • We can compare the evidence for different parameter values through their relative likelihood • For example, suppose we observe counts of 12, 22, 14 and 8 from a Poisson process • The maximum likelihood estimate is 14. The relative likelihood is given by

  6. Maximum likelihood estimation • The maximum likelihood estimate is the set of parameter values that maximise the probability of observing the data we got • The mle is consistent in that it converges to the truth as the sample size gets infinitely large • The mle is asymptotically efficient in that it achieves the minimum possible variance (the Cramér-Rao Lower Bound) as n→∞ • However, the mle is often biased for finite sample sizes • For example, the mle for the variance parameter in a normal distribution is the sample variance

  7. Confidence intervals and likelihood • Thanks to the CLT there is another useful result that allows us to define confidence intervals from the log-likelihood surface • Specifically, the set of parameter values for which the log-likelihood is not more than 1.92 less than the maximum likelihood will define a 95% confidence interval • In the limit of large sample size the LRT is approximately chi-squared distributed under the null • This is a very useful result, but shouldn’t be assumed to hold • i.e. Check with simulation

  8. Likelihood ratio tests • Suppose we have two models, H0 and H1, in which H0 is a special case of H1 • We can compare the likelihood of the MLEs for the two models • Note the likelihood under H1 can be no worse than under H0 • Theory shows that if H0 is true, then twice the difference in log-likelihood is asymptotically c2 distributed with degrees of freedom equal to the difference in the number of parameters between H0 and H1 • The likelihood ratio test • Theory also tells us that if H1 is true, then the likelihood ratio test is the most powerful test for discriminating between H0 and H1 • Useful, though perhaps not as useful as it sounds

  9. Criticisms of the frequentist approach • The choice between models using P-values is focused on rejecting the null rather than proving the appropriateness of the alternative • Representing uncertainty through the use of confidence intervals is messy and unintuitive • Cannot say that the probability of the true parameter being within the interval is 0.95 • The frequentist approach requires a predefined experimental approach that must be followed through to completion (at which point data are analysed) • Bayesian inference naturally adapts to interim analysis, changes in stopping rules, combining data from different sources • Focusing on point estimation leads to models that are ‘over-fitted’ to data

  10. Bayesian estimators • Bayesian statistics aims to make statements about the probability attached to different parameter values given the data you have collected • It makes use of Bayes’ theorem Prior Likelihood Posterior Normalising constant

  11. Are parameters random variables? • The single most important conceptual difference between Bayesian statistics and frequentist statistics is the notion that the parameters you are interested in are themselves random variables • This notion is encapsulated in the use of a subjective prior for your parameters • Remember that to construct a confidence interval we have to define the set of possible parameter values • A prior does the same thing, but also gives a weight to different values

  12. Example: coin tossing • I toss a coin twice and observe two heads • I want to perform inference about the probability of obtaining a head on a single throw for the coin in question • The MLE of the probability is 1.0 – yet I have a very strong prior belief that the answer is 0.5 • Bayesian statistics forces the researcher to be explicit about prior beliefs but, in return, can be very specific about what information has been gained by performing the experiment • It also provides a natural way for combining data from different experiments

  13. The posterior • Bayesian inference about parameters is contained in the posterior distribution • The posterior can be summarised in various ways Posterior mean Posterior Prior Credible Interval

  14. Choosing priors • A prior reflects your belief before the experiment • This might be relatively unfocused • Uniform distributions in the case of single parameters • Jeffreys prior (and other ‘uninformative’ priors) • Or might be highly focused • In the coin-tossing experiment, most of my prior would be on P=0.5 • In an association study, my prior on a SNP being causal might be 1/107

  15. Using posteriors • Posterior summary to provide statements about point estimates and certainty • Posterior prediction to make statements about future events • Posterior predictive simulation to check the fit of the model to data

  16. Bayes factors • Bayes factors can be used to compare the evidence for different models • These do not need to be nested • Bayes factors generalise the likelihood ratio by integrating the likelihood over the prior • Importantly, if model 2 is a subset of model 1, it does not follow that the Bayes factor is necessarily greater than 1 • The subspace of model 1 that improves the likelihood may be very small and the extra parameter carry extra cost • It is generally accepted that a BF of 3 is worth mention, a BF of 10 is strong evidence and a BF of 100 is decisive (Jeffreys)

  17. Example • Consider the crossing data of Bateson and Punnett in which we want to estimate the recombination fraction • I will use a beta prior for the recombination fraction with parameters 3 and 7

  18. Conditional on the total sample (381), the likelihood function is described by the multinomial • We get the following posterior distribution • Comparing the model to one in which r = 0.5 gives a BF of 3.9 Posterior mean = 0.134 Posterior mode = 0.13 95% ETPI = 0.10 – 0.16

  19. Bayesian inference and the notion of shrinkage • The notion of shrinkage is that you can obtained better estimates by assuming a certain degree of similarity among the things you want to estimate and a lack of complexity • Practically, this means three things • Borrowing information across observations • Penalising inferences that are very different from anything else • Penalising more complex models • The notion of shrinkage is implicit in the use of priors in Bayesian statistics • There are also forms of frequentist inference where shrinkage is used • But NOT MLE

More Related