170 likes | 278 Vues
This paper discusses a novel approach to Bayesian model averaging for structure discovery in Bayesian networks (BN). Traditional model selection often leads to reliance on a single high-scoring model, which can be misleading in under-sampled scenarios. We present a closed-form solution for a fixed ordering of nodes and propose a Markov Chain Monte Carlo (MCMC) method to explore general orderings. Our method significantly enhances convergence speed and ensures robust estimates of structural features, essential for effective structure discovery, especially in biological contexts with limited data.
E N D
Being Bayesian about Network Structure Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. .
Structure Discovery • Current practice:model selection • Pick a single model (of high score) • Use that model to represent domain structure • Enough data “right” model overwhelmingly likely • But what about the rest of the time? • Many high-scoring models • Answer based on one model often useless • Bayesian model averaging is Bayesian ideal Feature of G, e.g., XY
Model Averaging • Unfortunately, it is intractable: • # of possible structures is superexponential • That’s why no one really does it* • Our contribution: • Closed form solution for fixed ordering over nodes • MCMC over orderings for general case • Faster convergence, robust results. * Exceptions: Madigan & Raftery, Madigan & York; see below
Fixed Ordering Suppose that • We know the ordering of variables • say, X1 > X2 > X3 > X4 > … > Xn parents for Xi must be in X1,…,Xi-1 • Limit number of parents per nodes to k Intuition: • Order decouples choice of parents • The choice of parents for X7 do not restrict the choice of parents for X12 • We can exploit this to simplify the form of P(D) 2k•n•log n networks
Set of possible parent sets for Xi consistent with has size at most k Ordering: Computing P(D) Independence of families Small number of potential families per node Efficient closed-form summation over exponential number of structures
MCMC over Models • Cannot enumerate structures, so sample structures • MCMC Sampling • Define Markov chain over BN models • Run chain to get samples from posterior P(G | D) • Possible pitfalls: • huge number of models • mixing rate (also required burn-in) unknown • islands of high posterior, connected by low bridges
ICU Alarm BN: No Mixing • However, with 500 instances: • the runs clearly do not mix. Score of cuurent sample MCMC Iteration
Effects of Non-Mixing • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates highly variable, nonrobust Initialization true BN vs random true BN vs true BN
Our Approach: Sample Orderings We can write • Comment: Structure prior P(G) changes • uniform prior over structures uniform prior over orderings and on structures consistent with a given ordering Sample orderings and approximate
MCMC Over Orderings Use Metropolis-Hasting algorithm • Specify a proposal distribution q(’| ) • flip:(i1 … ij … ik … in) (i1 … ik … ij … in) • “cut”:(i1 … ij ij+1 … in) (ij+1 … in i1 … ij) Each iteration: • Sample’fromq(’| ) • go ’with probability • Since priors are uniform Efficient computation!!!
Why Ordering Helps • Smaller space • Significant reduction in size of sample space • Better structuredspace • We can get from one ordering to another in (relatively) small number of steps • Smoother posterior “landscape” • Score of an ordering is sum over many networks • No ordering is “horrendous” no “islands” of high posterior separated by a deep blue sea
Mixing with MCMC-Orderings • 4 runs on ICU-Alarm with 500 instances • fewer iterations than MCMC-Nets • approximately same amount of computation • Process is clearly mixing! Score of cuurent sample MCMC Iteration
Mixing of MCMC runs • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates very robust 1000 instances 100 instances
Computing Feature Posterior: P(f|’,D) Edges: Markov Blanket: • IfYZ or both Y and Z are parents of some X • Posterior of these features are independent Other features (e.g., existence of causal path): • Sample networks from ordering • Estimate features from networks
Structure Bootstrap Order 50 40 30 20 10 0 10 20 30 0 0 10 20 30 0 10 20 30 Feature Reconstruction (ICU-Alarm) Markov Features Reconstruct “true” features of generating network False Negatives False Positives
Structure Bootstrap Order 0 200 400 600 Feature Reconstruction (ICU-Alarm) Path Features 200 150 100 50 0 200 150 100 50 0 200 150 100 50 0
Conclusion • Full Bayesian model averaging is tractable for known ordering. • MCMC over orderings allows robust approximation to full Bayesian averaging over Bayes nets • rapid and reliable mixing • robust & reliable estimates for probability of structural features • Crucial for structure discovery in domains with limited data • Biological discovery