Bayesian Model Averaging for Network Structure Discovery

Being Bayesian about Network Structure Nir Friedman Daphne Koller Hebrew Univ. Stanford Univ. .

Structure Discovery • Current practice:model selection • Pick a single model (of high score) • Use that model to represent domain structure • Enough data  “right” model overwhelmingly likely • But what about the rest of the time? • Many high-scoring models • Answer based on one model often useless • Bayesian model averaging is Bayesian ideal Feature of G, e.g., XY

Model Averaging • Unfortunately, it is intractable: • # of possible structures is superexponential • That’s why no one really does it* • Our contribution: • Closed form solution for fixed ordering over nodes • MCMC over orderings for general case • Faster convergence, robust results. * Exceptions: Madigan & Raftery, Madigan & York; see below

Fixed Ordering Suppose that • We know the ordering of variables • say, X1 > X2 > X3 > X4 > … > Xn parents for Xi must be in X1,…,Xi-1 • Limit number of parents per nodes to k Intuition: • Order decouples choice of parents • The choice of parents for X7 do not restrict the choice of parents for X12 • We can exploit this to simplify the form of P(D) 2k•n•log n networks

Set of possible parent sets for Xi consistent with  has size at most k Ordering: Computing P(D) Independence of families Small number of potential families per node Efficient closed-form summation over exponential number of structures

MCMC over Models • Cannot enumerate structures, so sample structures • MCMC Sampling • Define Markov chain over BN models • Run chain to get samples from posterior P(G | D) • Possible pitfalls: • huge number of models • mixing rate (also required burn-in) unknown • islands of high posterior, connected by low bridges

ICU Alarm BN: No Mixing • However, with 500 instances: • the runs clearly do not mix. Score of cuurent sample MCMC Iteration

Effects of Non-Mixing • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates highly variable, nonrobust Initialization true BN vs random true BN vs true BN

Our Approach: Sample Orderings We can write • Comment: Structure prior P(G) changes • uniform prior over structures uniform prior over orderings and on structures consistent with a given ordering Sample orderings and approximate

MCMC Over Orderings Use Metropolis-Hasting algorithm • Specify a proposal distribution q(’| ) • flip:(i1 … ij … ik … in)  (i1 … ik … ij … in) • “cut”:(i1 … ij ij+1 … in)  (ij+1 … in i1 … ij) Each iteration: • Sample’fromq(’| ) • go  ’with probability • Since priors are uniform Efficient computation!!!

Why Ordering Helps • Smaller space • Significant reduction in size of sample space • Better structuredspace • We can get from one ordering to another in (relatively) small number of steps • Smoother posterior “landscape” • Score of an ordering is sum over many networks • No ordering is “horrendous” no “islands” of high posterior separated by a deep blue sea

Mixing with MCMC-Orderings • 4 runs on ICU-Alarm with 500 instances • fewer iterations than MCMC-Nets • approximately same amount of computation • Process is clearly mixing! Score of cuurent sample MCMC Iteration

Mixing of MCMC runs • Two MCMC runs over same 500 instances • Probability estimates for Markov features: • based on 50 nets sampled from MCMC process • Probability estimates very robust 1000 instances 100 instances

Computing Feature Posterior: P(f|’,D) Edges: Markov Blanket: • IfYZ or both Y and Z are parents of some X • Posterior of these features are independent Other features (e.g., existence of causal path): • Sample networks from ordering • Estimate features from networks

Structure Bootstrap Order 50 40 30 20 10 0 10 20 30 0 0 10 20 30 0 10 20 30 Feature Reconstruction (ICU-Alarm) Markov Features Reconstruct “true” features of generating network False Negatives False Positives

Structure Bootstrap Order 0 200 400 600 Feature Reconstruction (ICU-Alarm) Path Features 200 150 100 50 0 200 150 100 50 0 200 150 100 50 0

Conclusion • Full Bayesian model averaging is tractable for known ordering. • MCMC over orderings allows robust approximation to full Bayesian averaging over Bayes nets • rapid and reliable mixing • robust & reliable estimates for probability of structural features • Crucial for structure discovery in domains with limited data • Biological discovery

Bayesian Model Averaging for Network Structure Discovery

Bayesian Model Averaging for Network Structure Discovery

Presentation Transcript

Bayesian Network Awareness

Bayesian Network

BAYESIAN NETWORK

Bayesian Network

Bayesian network inference

First Order Bayesian Network

Bayesian Network Classifier

Multi Entity Bayesian Network

Bayesian Network

Dynamic Bayesian Network

Two Approaches to Bayesian Network Structure Learning

Bayesian Network

Bayesian Belief Network

Bayesian Network Structure Learning A Sequential Monte Carlo Approach

Bayesian Belief Network

Introduction of Bayesian Network

Bayesian Network Development

Bayesian Belief Network

Bayesian Network Awareness

Being Bayesian About Network Structure