Bayesian Statistics on a Shoestring

Stat 391 – Lecture 12 Bayesian Statistics on a Shoestring Assaf Oron, May 2008

Bayes’ Rule – and “Bayesians” • Bayes lived and proved his rule a long time ago • The rule, and the updating principle associated with it, belong to all branches of statistics • The term “Bayesian statistics” is modern. Depending upon whom you ask, it may represent: • A perspective and toolset, which are useful for many tasks; • The only way to do statistics intelligently; • …An irrational cult! • (it’s somewhat of a generational gap right now) • … I will try to present Bayesian statistics via the description marked in blue above

The Basic Principle • Recall the trick we did a few weeks ago, calling the density “likelihood” and viewing it as a function of the fixed parameters • Recall also more recently, the awkward jargon used to describe confidence intervals • These somewhat inelegant fixes can be traced down to an asymmetry: • The data are modeled as following some probability distribution • The parameters are modeled as fixed, though usually unknown • What if we decided that the parameters are random, too?...

The Basic Principle (2) • Let’s view the data as an r.v. called X • Parameters are, of course, θ; • Write down Bayes’ rule, using densities: This is the ‘regular’ (“frequentist”) likelihood of the data given fixed parameter values This is the ‘prior’ density of the parameters (based on previous knowledge, usually unrelated to the current data) This marginal probability of the data over all possible parameter configurations, is not a function of θ and is irrelevant for estimation

The Basic Principle (3) • …the Bayesian way of writing Bayes’ rule is usually this: The posterior distribution of the parameters, based on the data The prior distribution of the parameters, before the data (Since we omitted the marginal probability of the data, the equation becomes a proportionality; we don’t care, since we know the LHS is a density we can “find” the missing factor automatically by normalizing the integral of the LHS to 1)

Bayesian Estimation • Bayesian estimation is based primarily on probability calculations from the posterior, • The most common Bayesian point estimates are the posterior mean (i.e., E[θ|x]), median or mode • These can be framed as solutions to different loss-minimization problems

A Brief History of Bayesianism • The Bayesian idea has been around for while, but sat mostly on the shelf for practical reasons: • If you take any two arbitrary distributions for data and prior, you will end up with an intractably complicated posterior • (for each “common” data distribution, there exists at least one type of prior that fits it well; it is known as the “conjugate prior”) • With the advent of computing, a statistical-simulation technology known as MCMC (“Markov Chain Monte Carlo”) has made (nearly) any combination of distributions possible to compute, sometimes instantly

Conjugate Prior Hands-on • The conjugate prior for the Binomial is the Beta • That is: X ~ Binomial(n,p) and p ~ Beta(α,β) should match nicely • Write out the kernel of the posterior (i.e., the essential form – only terms with x or p in them): • Simplify this a bit further; can you recognize the form of the posterior?

Advantages of Bayesian Methods • A symmetry that is conceptually attractive • Can incorporate prior content information (from scientists, etc.) that should play a role in evaluation of the data • Hypothesis tests, model selection, confidence intervals become easier • Risk of wrong model (=“model misspecification”) can be reduced • More complete information about parameters

Advantages of Bayesian Methods (2) • Avoids some of the counter-intuitive side-effects of MLE calculations • Ability to fit complicated models, estimate complicated parameters, accommodate for errors in “fixed” values • In many cases, a random interpretation fits the parameters more than a fixed one: • Opinion polls and human behavior • Ecology, Demographics (coming to think about it, natural populations are never really fixed)

Drawbacks of Bayesian Methods • Symmetry? Not really • “It’s Tortoises all the way down”:the prior needs parameters too… and they better be fixed, or else; which is exactly the problem • The prior affects our estimation, whether or not it is really based on expert knowledge • A workaround known as “flat” or “improper” priors, has made things worse in many ways: if you use them, you may find yourself not having a posterior distribution at all

Drawbacks of Bayesian Methods (2) • Choice of prior form and details – adds yet another arbitrary element to the tenuous connection between model and reality • MCMC simulations have a lot of “moving parts” and are not trivial to diagnose for problems • Socially, the approach has “hype”, and dogmatic “group-think” overtones that are not helpful • In many cases, a random interpretation is not appropriate

Bayesian Statistics on a Shoestring

Bayesian Statistics on a Shoestring

Presentation Transcript

STAT 111 Introductory Statistics

Stat 112: Lecture 7 Notes

Stat 112: Lecture 13 Notes

Stat 13 Lecture 22 comparing proportions

STAT 110 - Section 5 Lecture 23

STAT 110 - Section 5 Lecture 13

STAT 110 - Section 5 Lecture 23

Intermediate Applied Statistics STAT 460

Course Logistics

Stat 13 Lecture 19 discrete random variables, binomial

STAT 110 - Section 5 Lecture 1

STAT 110 - Section 5 Lecture 17

Stat 100, This week

STATS 330: Lecture 4

Stat 350, Lecture # 2

ENGR 224/STAT 224 Probability and Statistics Lecture 11

Line of Best Fit

STAT 3120 Statistical Methods I

CS 311 – Lecture 12 Outline

Statistics Major at Penn State