§❷ An Introduction to Bayesian inference

§❷ An Introduction to Bayesian inference Robert J. Tempelman

Bayes Theorem • Recall basic axiom of probability: • f(q,y) = f(y|q) f(q) • Also • f(q,y) = f(q|y) f(y) • Combine both expressions to get: or Posterior  Likelihood * Prior

Prior densities/distributions • What can we specify for ? • Anything that reflects our prior beliefs. • Common choice: “conjugate” prior. • is chosen such that is recognizeable and of same form. • “Flat” prior: . Then • flat priors can be dangerous…can lead to improper ; i.e.

Prior information / Objective? • Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with • 1) human rational behavior • 2) nature of the scientific method. • Memory property: past inference (posterior) can be used as updated prior in future inference. • Nevertheless, many applied Bayesian data analysts try to be as “objective” as possible using diffuse (e.g., flat) priors.

Example of conjugate prior • Recall the binomial distribution: • Suppose we express prior belief on p using a beta distribution: • Denoted as Beta(a,b)

Examples of different beta densities Diffuse (flat) bounded prior (but it is proper since it is bounded!)

Posterior density of p • Posterior  Likelihood * Prior • i.e. Beta(y+a,n-y+b) • Beta is conjugate to the Binomial

Suppose we observe data Posterior densities: • y = 10, n = 15. • Consider three alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) • Beta(y+a,n-y+b)

Suppose we observed a larger dataset • y = 100, n = 150. • Consider same alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) Posterior densities

Posterior information • Given: • Posterior information = likelihood information + prior information. • One option for point estimate: joint posterior mode of q using Newton Raphson. • Also called MAP (maximum a posteriori) estimate of q.

Recall the plant genetic linkage example • Recall Suppose • Then Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.

Plant linkage example cont’d. Suppose datanewton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output; run; title "Posterior Standard Error = &poststd"; procprint; var iterate theta logpost; run; Posterior standard error

Output Posterior Standard Error = 0.0057929339

Additional elements of Bayesian inference • Suppose that q can be partitioned into two components, a px1 vector q 1and a qx1 vector q2, • If want to make probability statements about q, use probability calculus: • There is NO repeated sampling concept. • Condition on one observed dataset. • However, Bayes estimators typically do have very good frequentist properties!

Marginal vs. conditional inference • Suppose you’re primarily interested in q1: • i.e. average over uncertainty on q2 (nuisance variables) • Of course, if q2 was known, you would conditionyour inference on q1 accordingly:

Two-stage model example • Given with yi~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem: Suppose i.e.

Simplify likelihood

Posterior density • Consider the following limit: • Consistent with or

Interpretation of Posterior Density with Flat Prior • So • Then • i.e.

Posterior density with informative prior • Now After algebraic simplication:

Note that Posterior precision = prior precision + sample (likelihood) precision i.e., weighted average of data mean and prior mean

Hierarchical models • Given • Two stage: • Three stage: • What’s the difference? When do you consider one over another?

Simple hierarchical model • Random effects model • Yij = m + ai + eij m: overall mean, ai ~ NIID(0,t2) ; eij~ NIID(0,s2). Suppose we knew m , s2, and t2: Shrinkage factor

What if we don’t know m , s2, or t2? • Option 1: Estimate them: • Then “plug them” in. • Not truly Bayesian. • Empirical Bayesian (EB) (next section). • Most of us using PROC MIXED/GLIMMIX are EB! e.g.method of moments

A truly Bayesian approach • 1) Yij|qi ~ N(qi,s2) ; for all i,j • 2)q1, q2, …, qkare iidN(m, t2) • Structural prior (exchangeable entities) • 3) m ~ p(m); t2~ p(t2); s2~ p(s2) • Subjective prior Fully Bayesian inference (next section after that!)

§❷ An Introduction to Bayesian inference