Advanced Learning Algorithms for Structure in Bayesian Networks

MINVOLSET KINKEDTUBE PULMEMBOLUS INTUBATION VENTMACH DISCONNECT PAP SHUNT VENTLUNG VENITUBE PRESS MINOVL FIO2 VENTALV PVSAT ANAPHYLAXIS ARTCO2 EXPCO2 SAO2 TPR INSUFFANESTH HYPOVOLEMIA LVFAILURE CATECHOL LVEDVOLUME STROEVOLUME ERRCAUTER HR ERRBLOWOUTPUT HISTORY CO CVP PCWP HREKG HRSAT HRBP BP Advanced IWS 06/07 Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Graphical Models- Learning - Structure Learning Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany

Learning With Bayesian Networks ? ? ? A B A B H A B - Learning Stucture learing? Parameter Estimation

Increases the number of parameters to be estimated Wrong assumptions about domain structure Cannot be compensated for by fitting parameters Wrong assumptions about domain structure Truth Earthquake Earthquake AlarmSet Alarm Set Burglary Burglary Earthquake Alarm Set Burglary Sound Sound Sound Why Struggle for Accurate Structure? Missing an arc Adding an arc - Learning

B E A Learning algorithm E E B B P(A | E,B) P(A | E,B) .9 ? .1 ? e e b b e e b b .7 ? .3 ? .8 ? .2 ? e e b b .99 ? .01 ? e e b b Unknown Structure, (In)complete Data E, B, A <Y,N,N> <Y,N,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> • Network structure is not specified • Learnerr needs to select arcs & estimate parameters • Data does not contain missing values B E A - Learning E, B, A <Y,?,N> <Y,N,?> <N,N,Y> <N,Y,Y> . . <?,Y,Y> • Network structure is not specified • Data contains missing values • Need to consider assignments to missing values

E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y> Score-based Learning Define scoring function that evaluates how well a structure matches the data score - Learning E B E E A A B A B Search for a structure that maximizes the score

Structure Search as Optimization Input: • Training data • Scoring function • Set of possible structures Output: • A network that maximizes the score - Learning

Heuristic Search • Define a search space: • search states are possible structures • operators make small changes to structure • Traverse space looking for high-scoring structures • Search techniques: • Greedy hill-climbing • Best first search • Simulated Annealing • ... Theorem: Finding maximal scoring structure with at most k parents per node is NP-hard for k > 1 - Learning

S C E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score - Learning

S C E S C D E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D - Learning

S C E S C D E D S C E D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D Reverse C E - Learning

S C E S C D E D S C S C E E D D Typically: Local Search • Start with a given network • empty network, best tree , a random network • At each iteration • Evaluate all possible changes • Apply change based on score • Stop when no modification improves score Add C D Reverse C E Delete C E - Learning

S C E S C D E D S C S C E E D D Typically: Local Search If data is complete: To update score after local change, only re-score (counting) families that changed Add C D Reverse C E Delete C E - Learning If data is incomplete: To update score after local change, reran parameter estimation algorithm

Local Search in Practice • Local search can get stuck in: • Local Maxima: • All one-edge changes reduce the score • Plateaux: • Some one-edge changes leave the score unchanged • Standard heuristics can escape both • Random restarts • TABU search • Simulated annealing - Learning

<9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. Local Search in Practice • Using LL as score, adding arcs always helps • Max score attained by fully connected network • Overfitting: A bad idea… • Minimum Description Length: • Learning  data compression • Other: BIC (Bayesian Information Criterion), Bayesian score (BDe) - Learning DL(Data|model) DL(Model)

G3 G2 G4 Gn G1 Local Search in Practice Parameter space • Perform EM for each candidate graph Parametric optimization (EM) Local Maximum - Learning

G3 G2 G4 Gn G1 Local Search in Practice Parameter space • Perform EM for each candidate graph Parametric optimization (EM) Local Maximum • Computationally expensive: • Parameter optimization via EM — non-trivial • Need to perform EM for all candidate structures • Spend time even on poor candidates  In practice, considers only a few candidates - Learning

Structural EM [Friedman et al. 98] Recall, in complete data we had • Decomposition  efficient search Idea: • Instead of optimizing the real score… • Find decomposable alternative score • Such that maximizing new score  improvement in real score - Learning

Structural EM [Friedman et al. 98] Idea: • Use current model to help evaluate new structures Outline: • Perform search in (Structure, Parameters) space • At each iteration, use current model for finding either: • Better scoring parameters: “parametric” EM step or • Better scoring structure: “structural” EM step - Learning

Reiterate Score & Parameterize Computation X1 X1 X1 X2 X2 X2 X3 X3 X3 H H Y1 Y1 Y1 Y2 Y2 Y2 Y3 Y3 Y3 H Structural EM [Friedman et al. 98] Expected Counts N(X1) N(X2) N(X3) N(H, X1, X1, X3) N(Y1, H) N(Y2, H) N(Y3, H)  - Learning N(X2,X1) N(H, X1, X3) N(Y1, X2) N(Y2, Y1, H) Training Data

Inference: P(S|X=0,D=1,C=0,B=1) Expected counts S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ? ?> <?? 0 ? 1> ……… S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Data Structure Learning: incomplete data E A Expectation B Current model Maximization Parameters - Learning EM-algorithm: iterate until convergence

Inference: P(S|X=0,D=1,C=0,B=1) Expected counts S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ? ?> <?? 0 ? 1> ……… S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ……….. Data Structure Learning: incomplete data B E E A A Expectation B Current model Maximization Parameters - Learning Maximization Structure E B E E SEM-algorithm: iterate until convergence A A B A B

Structure Learning: Summary • Expert knowledge + learning from data • Structure learning involves parameter estimation (e.g. EM) • Optimization w/ score functions • likelihood +complexity penality= MDL • Local traversing of space of possible structures: • add, reverse, delete (single) arcs • Speed-up: Structural EM • Score candidates w.r.t. current best model - Learning

Advanced Learning Algorithms for Structure in Bayesian Networks

Advanced Learning Algorithms for Structure in Bayesian Networks

Presentation Transcript

Graphical Models

Incomplete Graphical Models

Learning in Undirected Graphical Models

Learning with Inference for Discrete Graphical Models

Graphical Models

Graphical Models

Graphical Models

GRAPHICAL MODELS

Hierarchical Reinforcement Learning Using Graphical Models

Probabilistic Graphical Models

Compiling Graphical Models

Graphical Models

Graphical Models - Learning -

Graphical Multiagent Models

Graphical Models in Machine Learning

Graphical Causal Models

Probabilistic Graphical Models

Hierarchical Reinforcement Learning Using Graphical Models

Graphical Models