250 likes | 423 Vues
This text explores the fundamental principles of Bayesian inference, including Bayes' Theorem, the importance of prior beliefs, and the role of likelihood in updating knowledge. It discusses the concept of conjugate priors, the dangers of flat priors, and how to formulate Bayesian models effectively. The piece also highlights practical examples, such as the binomial distribution with beta priors and posterior density applications. Further topics include joint posterior estimation methods, the significance of Bayesian models in scientific methodology, and the relevance of hierarchical models in data analysis.
E N D
§❷ An Introduction to Bayesian inference Robert J. Tempelman
Bayes Theorem • Recall basic axiom of probability: • f(q,y) = f(y|q) f(q) • Also • f(q,y) = f(q|y) f(y) • Combine both expressions to get: or Posterior Likelihood * Prior
Prior densities/distributions • What can we specify for ? • Anything that reflects our prior beliefs. • Common choice: “conjugate” prior. • is chosen such that is recognizeable and of same form. • “Flat” prior: . Then • flat priors can be dangerous…can lead to improper ; i.e.
Prior information / Objective? • Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with • 1) human rational behavior • 2) nature of the scientific method. • Memory property: past inference (posterior) can be used as updated prior in future inference. • Nevertheless, many applied Bayesian data analysts try to be as “objective” as possible using diffuse (e.g., flat) priors.
Example of conjugate prior • Recall the binomial distribution: • Suppose we express prior belief on p using a beta distribution: • Denoted as Beta(a,b)
Examples of different beta densities Diffuse (flat) bounded prior (but it is proper since it is bounded!)
Posterior density of p • Posterior Likelihood * Prior • i.e. Beta(y+a,n-y+b) • Beta is conjugate to the Binomial
Suppose we observe data Posterior densities: • y = 10, n = 15. • Consider three alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) • Beta(y+a,n-y+b)
Suppose we observed a larger dataset • y = 100, n = 150. • Consider same alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) Posterior densities
Posterior information • Given: • Posterior information = likelihood information + prior information. • One option for point estimate: joint posterior mode of q using Newton Raphson. • Also called MAP (maximum a posteriori) estimate of q.
Recall the plant genetic linkage example • Recall Suppose • Then Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.
Plant linkage example cont’d. Suppose datanewton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output; run; title "Posterior Standard Error = &poststd"; procprint; var iterate theta logpost; run; Posterior standard error
Output Posterior Standard Error = 0.0057929339
Additional elements of Bayesian inference • Suppose that q can be partitioned into two components, a px1 vector q 1and a qx1 vector q2, • If want to make probability statements about q, use probability calculus: • There is NO repeated sampling concept. • Condition on one observed dataset. • However, Bayes estimators typically do have very good frequentist properties!
Marginal vs. conditional inference • Suppose you’re primarily interested in q1: • i.e. average over uncertainty on q2 (nuisance variables) • Of course, if q2 was known, you would conditionyour inference on q1 accordingly:
Two-stage model example • Given with yi~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem: Suppose i.e.
Posterior density • Consider the following limit: • Consistent with or
Interpretation of Posterior Density with Flat Prior • So • Then • i.e.
Posterior density with informative prior • Now After algebraic simplication:
Note that Posterior precision = prior precision + sample (likelihood) precision i.e., weighted average of data mean and prior mean
Hierarchical models • Given • Two stage: • Three stage: • What’s the difference? When do you consider one over another?
Simple hierarchical model • Random effects model • Yij = m + ai + eij m: overall mean, ai ~ NIID(0,t2) ; eij~ NIID(0,s2). Suppose we knew m , s2, and t2: Shrinkage factor
What if we don’t know m , s2, or t2? • Option 1: Estimate them: • Then “plug them” in. • Not truly Bayesian. • Empirical Bayesian (EB) (next section). • Most of us using PROC MIXED/GLIMMIX are EB! e.g.method of moments
A truly Bayesian approach • 1) Yij|qi ~ N(qi,s2) ; for all i,j • 2)q1, q2, …, qkare iidN(m, t2) • Structural prior (exchangeable entities) • 3) m ~ p(m); t2~ p(t2); s2~ p(s2) • Subjective prior Fully Bayesian inference (next section after that!)