Create Presentation
Download Presentation

Download Presentation
## Fundamentals of Bayesian Inference

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Brief Introduction ofYour Lecturer**• I am working at the Psychological Methods Group at the University of Amsterdam. • For the past 10 years or so, one of my main research interests has been Bayesian statistics. • I have been promoting Bayesian inference in psychology, mainly through a series of articles, workshops, and one book.**The Bayesian Book**• …is a course book used at UvA and UCI. • …will appear in print soon. • ….is freely available at http://www.bayesmodels.com (well, the first part is freely available)**Bayesian Modeling for Cognitive ScienceA WinBUGS**Workshophttp://bayescourse.socsci.uva.nl/ August 12 - August 16, 2013University of Amsterdam**Why DoBayesian Modeling**• It is fun. • It is cool. • It is easy. • It is principled. • It is superior. • It is useful. • It is flexible.**Our Goals This Afternoon Are…**• To discuss some of the fundamentals of Bayesian inference. • To make you think critically about statistical analyses that you have always taken for granted. • To present clear practical and theoretical advantages of the Bayesian paradigm.**Prelude**Eric-Jan Wagenmakers**Three Schools of Statistical Inference**• Neyman-Pearson: α-level, power calculations, two hypotheses, guide for action (i.e., what to do). • Fisher: p-values, one hypothesis (i.e., H0), quantifies evidence against H0. • Bayes: prior and posterior distributions, attaches probabilities to parameters and hypotheses.**A Freudian Analogy**• Neyman-Pearson: The Superego. • Fisher: The Ego. • Bayes: The Id. Claim: What Id really wants is to attach probabilities to hypotheses and parameters. This wish is suppressed by the Superego and the Ego. The result is unconscious internal conflict.**Internal Conflict Causes Misinterpretations**• p < .05 means that H0 is unlikely to be true, and can be rejected. • p > .10 means that H0 is likely to be true. • For a given parameter μ, a 95% confidence interval from, say, a to b means that there is a 95% chance that μ lies in between a and b.**Two Ways to Resolve the Internal Conflict**• Strengthen Superego and Ego by teaching the standard statistical methodology more rigorously. Suppress Id even more! • Give Id what it wants.**Sentenced by p-value**The Unfortunate Case of Sally Clark**The Case of Sally Clark**• Sally Clark had two children die of SIDS. • The chances of this happening are perhaps as small as 1 in 73 million: 1/8543 × 1/8543. • Can we reject the null hypothesis that Sally Clark is innocent, and send her to jail? • Yes, according to an expert for the prosecution, Prof. Meadow.**Prof. Roy Meadow,Britisch Paediatrician**• “Meadow attributed many unexplained infant deaths to the disorder or condition in mothers called Münchausen Syndrome by Proxy.” • “According to this diagnosis some parents, especially mothers, harm or even kill their children as a means of calling attention to themselves.” (Wikepedia)**Meadow’s Law**“One cot death is a tragedy, two cot deaths is suspicious and, until the contrary is proved, three cot deaths is murder.”**The Outcome**• In November 1999, Sally Clark was convicted of murdering both babies by a majority of 10 to 2 and sent to jail.**The Outcome**• Note the similarity to p-value hypothesis testing. A very rare event occurred, prompting the law system to reject the null hypothesis (“Sally is innocent”) and send Sally to jail.**Critique**• The focus is entirely on the low probability of the deaths arising from SIDS. • But what of the probability of the deaths arising from murder? Isn’t this probability just as relevant? How likely is it that a mother murders her two children?**2002 Royal Statistical Society Open Letter**“The jury needs to weigh up two competing explanations for the babies’ deaths: SIDS or murder. The fact that two deaths by SIDS is quite unlikely is, taken alone, of little value. Two deaths by murder may well be even more unlikely.What matters is the relative likelihood of the deaths under each explanation, not just how unlikely they are under one explanation.” President Peter Green to the Lord Chancellor**What is the p-value?**“The probability of obtaining a test statistic at least as extreme as the one you observed, given that the null hypothesis is true.”**The Logic of p-Values**• The p-value only considers how rare the observed data are under H0. • The fact that the observed data may also be rare under H1 does not enter consideration. • Hence, the logic of p-values has the same flaw as the logic that lead to the sentencing of Sally Clark.**Adjusted Open Letter**“Researchers need to weigh up two competing explanations for the data: H0 or H1. The fact that data are quite unlikely under H0 is, taken alone, of little value. The data may well be even more unlikely under H1.What matters is the relative likelihood of the data under each model, not just how unlikely they are under one model.”**What is Bayesian Inference?**“Common sense expressed in numbers”**What is Bayesian Inference?**“The only statistical procedure that is coherent, meaning that it avoids statements that are internally inconsistent.”**What is Bayesian Statistics?**“The only good statistics” [For more background see Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 49, 293-337.]**Outline**• Bayes in a Nutshell • The Bayesian Revolution • Hypothesis Testing**Bayesian Inferencein a Nutshell**• In Bayesian inference, uncertainty or degree of belief is quantified by probability. • Prior beliefs are updated by means of the data to yield posterior beliefs.**Bayesian Parameter Estimation: Example**• We prepare for you a series of 10 factual questions of equal difficulty. • You answer 9 out of 10 questions correctly. • What is your latent probability θof answering any one question correctly?**Bayesian Parameter Estimation: Example**• We start with a prior distribution for θ. This reflect all we know about θ prior to the experiment. Here we make a standard choice and assume that all values of θ are equally likely a priori.**Bayesian Parameter Estimation: Example**• We then update the prior distribution by means of the data (technically, the likelihood)to arrive at a posterior distribution. • The posterior distribution is a compromise between what we knew before the experiment and what we have learned from the experiment. The posterior distribution reflects all that we know about θ.**Mode = 0.9**95% confidence interval: (0.59, 0.98)**The Inevitability of Probability**• Why would one measure “degree of belief” by means of probability? Couldn’t we choose something else that makes sense? • Yes, perhaps we can, but the choice of probability is anything but ad-hoc.**The Inevitability of Probability**• Assume “degree of belief” can be measured by a single number. • Assume you are rational, that is, not self-contradictory or “obviously silly”. • Then degree of belief can be shown to follow the same rules as the probability calculus.**The Inevitability of Probability**• For instance, a rational agent would not hold intransitive beliefs, such as:**The Inevitability of Probability**• When you use a single number to measure uncertainty or quantify evidence, and these numbers do not follow the rules of probability calculus, you can (almost certainly?) be shown to be silly or incoherent. • One of the theoretical attractions of the Bayesian paradigm is that it ensures coherence right from the start.**Coherence I**• Coherence is also key in de Finetti’s conceptualization of probability.**Coherence II**• One aspect of coherence is that “today’s posterior is tomorrow’s prior”. • Suppose we have exchangeable (iid) data x = {x1, x2}. Now we can update our prior using x, using first x1 and then x2, or using first x2 and then x1. • All the procedures will result in exactly the same posterior distribution.**Coherence III**• Assume we have three models: M1, M2, M3. • After seeing the data, suppose that M1 is 3 times more plausible than M2, and M2 is 4 times more plausible than M3. • By transitivity, M1 is 3x4=12 times more plausible than M3.**Outline**• Bayes in a Nutshell • The Bayesian Revolution • Hypothesis Testing**The Bayesian Revolution**• Until about 1990, Bayesian statistics could only be applied to a select subset of very simple models. • Only recently, Bayesian statistics has undergone a transformation; With current numerical techniques, Bayesian models are “limited only by the user’s imagination.”**Why Bayes is Now Popular**Markov chain Monte Carlo!**Markov Chain Monte Carlo**• Instead of calculating the posterior analytically, numerical techniques such as MCMC approximate the posterior by drawing samples from it. • Consider again our earlier example…