Review of Bayesian Analysis and Modeling

Review of Bayesian Analysis and Modeling James B. Elsner Department of Geography, Florida State University

Bayesian analysis and modeling using MCMC is good for you because… • It has a solid decision-theoretical framework. Output from a probability model can be linked naturally to a utility model. • It is intuitive. Combines the prior distribution (prior beliefs and/or experience) with the likelihood (experiment) to obtain the posterior distribution (accumulated information). In this context, knowledge is equivalent to a distribution, and knowledge precision can be quantified by the precision parameter. • It is exact. Asymptotic approximations are not used. The “plug-in” principle is avoided. • Computations are tractable for practically all models. • Focus shifts from model estimation to model correctness.

http://garnet.fsu.edu/~jelsner/www/research.html Anticipating Florida Hurricanes, November 2004.

Frequentist Bayesian Definition of probability Point estimate Confidence intervals of parameters Confidence intervals of non-parameters Model selection Difficulties Long-run expected frequency in repeated (actual or hypothetical) experiments. Maximum likelihood estimate. Based on the likelihood ratio test, which is based on the expected probability distribution of the maximum likelihood over many experiments. Based on likelihood profile, or by resampling from the sampling distribution of the parameter. Discard terms that are not significantly different from a null model (nested) at a previously set confidence level. CI are confusing (range which will contain true value in proportion a of repeated experiments); rejection of model terms for non-significance. Relative degree of belief in the state of the world. Mean, mode, or median of the posterior probability distribution. Credible intervals based on the posterior probability distribution. Calculated directly from the distribution of parameters. Retain terms in models on the argument that processes are not absent simply because they are not statistically significant. Subjectivity; need to specify priors.

Why use Bayesian analysis and models? When you have good (informative) prior information that you want to use (from previous experiments, or reliable information from literature). Frequentist meta-analysis can also be used in this case, but the Bayesian approach is natural. When you want to specify the distribution of non-parameters (future values), Bayesian models work well. In particular, if you are going to try to make a practical decision based on your model (using decision theory), for example picking a management regime that minimizes expected risks or maximizes expected returns, having the posterior distribution of all the parameters makes the decision calculations easier. Complicated models such as hierarchical models or models with missing or unobserved data can be particularly conveniently fit using Bayesian methods. Frequentists call these random-effect or mixed models and have their own estimation procedures.

How Gibbs Sampling Works The contours in the plot represent the joint distribution of q and the labels (0), (1) etc., denote the simulated values. Note that one iteration of the algorithm is complete after both components are revised. Also notice that each component is revised along the direction of the coordinate axes. This feature can be a source of problems if the two components are correlated because then the contours get compressed and movements along the coordinate axes tend to produce small moves. q2 (3) Gibbs sampling algorithm in two dimensions starting from an initial point and then completing three iterations. (2) (1) (0) q1

After the model has converged, samples from the conditional distributions are used to summarize the posterior distribution of parameters of interest. Convergence refers to the idea that eventually the Gibbs samples will reach a stationary distribution. The question is: 1) How many samples are needed before convergence? The samples discarded before convergence are called “burn-in”. 2) After convergence, how many samples are needed to summarize the posterior distribution? Answers vary from model to model and diagnostics are used to examine the samples for convergence. Possible convergence problems. The assumed model may not be realistic from a substantive point of view or may not fit. Errors in calculation or programming. Often, simple syntax mistakes may be responsible; however, it is possible that the algorithm may not converge to a proper distribution. Slow convergence: this is the problem we are most likely to run into. The simulation can remain for many iterations in a region heavily influenced by the starting distribution. If the iterations are used to summarize the target distribution, they can yield falsely precise estimates.

Diagnostics for examining convergence. One intuitive and easily implemented diagnostic tool is a trace plot (or history plot) which plots the parameter value as a function of sample number. If the model has converged, the trace plot will move up and down around the mode of the distribution. A clear sign of non-convergence occurs when we observe some trending in the trace plot. In WinBugs, you can setup trace plots to monitor parameters while the program runs. Trace plots convergence non-convergence

An autocorrelation plot will reveal whether or not there is correlation between successive samples. The presence of correlation indicates that the samples are not effective in moving through the entire posterior distribution. WinBUGS plots the autocorrelation out to 50 lags. No large autocorrelation Autocorrelation

If the model has converged, additional samples from a parameter’s posterior distribution should not influence the calculation of the mean. Running means of the samples will reveal if the posterior mean has settled to a particular value. A “lumpy” posterior may indicate non-convergence. It may be necessary to let the sampler generate additional samples. Geweke Time-Series Diagnostic: If a model has converged, then the series of samples from the first half of the chain will have the same “time-series” properties of the series of samples from the second half of the chain.

Gelman-Rubin Diagnostic: Another way to identify convergence is to simulate multiple sequences from different starting points. The behavior of the sequence of chains should be the same. That is, the variance within the chains should be the same as the variance across the chains. convergence non-convergence

If more than 1 chain is initialized, the Gelman-Rubin statistic is reported in WinBugs. The statistic is based on the following procedure: 1) estimate your model with a variety of different initial values and iterate for an n-iteration burn-in and an n-iteration monitored period. 2) take the n-monitored draws of m parameters and calculate the following statistics: Before convergence, W underestimates total posterior variance in because it has not fully explored the target distribution. V() on the other hand overestimates variance in  because the starting points are over-dispersed relative to the target.

Once convergence is reached, W and V() should be almost equivalent because variation within the chains and variations between the chains should coincide, so R should approximately equal one. The normalized width of the central 80% interval of the pooled runs is green The normalized average width of the 80% intervals within the individual runs is blue R is red. R would generally be expected to be greater than 1 if the starting values are suitably over-dispersed. Brooks and Gelman (1998) emphasize that one should be concerned both with convergence of R to 1, and with convergence of both the pooled and within interval widths to stability.

Speeding up the sampling Convergence sometimes requires generating 10s or 100s of thousands of samples. If this is the case, it is important to speed up the sampling. Standardize your input variables. Subtract the mean and divide by the standard deviation. This decreases the correlation between model variables which can decrease the autocorrelation between samples. Use WinBugs’ over-relaxation algorithm. This generates multiple samples at each iteration and then selects one that is negatively correlated with the current value. The time per iteration will increase, but the within-chain correlations should be reduced and hence fewer iterations may be necessary. However, this method is not always effective and should be used with caution. The autocorrelation function should be used to check whether the mixing of the chain improves. Pick good initial values. If your initial values are near their posterior modes, then convergence should occur relatively quickly, especially if there is not a big problem with autocorrelation. Just wait. With models involving many parameters it may take 100s of thousands of samples. Using the “thin” option, you do not need to save all the sample values for all the parameters.

Review of Bayesian Analysis and Modeling

Review of Bayesian Analysis and Modeling

Presentation Transcript

Modeling and Analysis

Review: Bayesian learning and inference

Bayesian inference review

Bayesian modeling and analysis of stochastic volatility in finance

Review: Bayesian networks

Bayesian Population Pharmacokinetic/Pharmacodynamic Modeling

MODELING AND ANALYSIS

Bayesian Subgroup Analysis

Bayesian Hierarchical Modeling of Hydroclimate Problems

Bayesian Analysis of Stellar Evolution

Bayesian Generative Modeling

Analysis II: Bayesian Analysis of Individual Data

Bayesian dynamic modeling of latent trait distributions

Bayesian Analysis of Ellipsometry Measurements

Hybrid Systems Modeling, Analysis, Control Review and Vistas of Research

Bayesian analysis of microarray traits

Bayesian Generative Modeling

Generative (Bayesian) modeling

Hybrid Systems Modeling, Analysis, Control Review and Vistas of Research