Two Approaches to Bayesian Network Structure Learning

Two Approaches to Bayesian Network Structure Learning Yael Kinderman & Tali Goren Goal:Compare an algorithm that learns a BN tree structure (TAN) with an algorithm that learns a constraints-free structure – (Build-BN). Problem Definition: Finding an exact BN structure for complete discrete data. • Known to be NP-hard. • maximization problem over defined score. Build-BN Algorithm: Algorithm’s Attributes: • No structural constraints. • Straight Forward approach – not avoiding any computation. • Feasible only for small networks (<30 variables). Crucial Facts lying in the core of the Algorithm: • There are scoring-functions which are decomposable to local scores (we used BIC for the algorithm) • Every DAG has at least one node with no outgoing arcs (=sink). • Implementation Note:Build-BN requires a lot of memory.Therefore, implementation strongly utilizes the file-system.

BIC(x,vs) = Where: k iterates over all possible values of x J iterates over all possible values of Pa(x), N = number of samples Nj = number of samples where par(X)= j, Njk = “ “ “ “ “ “ “ and X=k. Algorithm’s Flow Step I: Find Local Scores: , (V = set of all variables), calculate ‘local BIC’: " Î  x V , vs ( V \ x ) All in all, n2n-1 scores are calculated in this step. Step II: Find Best Parents: , find best parents of x in the var-set. Traversing var-sets by lexicographic order (smaller to larger), Results in time complexity of O((n-1)2n-1).

) Sink*(W) arg max ( skore ( W , s ) Algorithm’s Flow – cont. Step III: Find Best Sinks For each 2n var-sets we find a best sink. Let Sink*(W) be the best sinkof a var-set W. Then Sink*(W) canbe found by: = Î s W • Where: g*s(var-set) = the best set of parents for s in the var-set. • G*(var-set) = the highest scoring network for a var-set. • We traverse var-sets by lexicographic order, and use scores that were calculated in previous iterations.

Algorithm’s Flow – cont. Step IV:Find Best Order Best sinks immediately yield the best ordering (in reverse order). Step V: Find best network Having best order (ordi*(V)) and best parents (g*(W)) for each W V, we can find the network as following: { } | | V = * * * U ord ( V ) sin k ( V \ ord ( V ) ) i j = + 1 j i In other words: the ith var in the optimal ordering, picks best parents from the var-set that contains all the variables that are predecessors in the ordering.

Using the BN for Prediction • 5-fold cross validation:80% of the data used for building structure & CPDs,20% “ “ “ “ “ ‘label prediction’. • Predicting the label ‘C’ of a given sample is done using:

Prediction Success Rate: 0.836 Prediction Success Rate: 0.85 Test over the Famous ‘Student’ Model Testing our implementation over ‘synthetic’ data: • We simulated 300 samples according to the BN and the CPDs as were presented in class. • Prediction performed using TAN and build-BN. Build-BN result TAN result Note: In Build-BN, 4 out of 5 fold cross validation gave the above net.

Experimental Results Data taken from: UCI machine learning DB • Possible explanation for the last 2 results: • Zoo – only 101 instances… • Vehicle – what’s wrong with this data ?!  • Note the low in-degrees (model induced by data-sets are by nature close to trees).

TAN result Prediction Success Rate: 0.969 Prediction Success Rate: 0.937 Example I: CorralBuild-BN does not force‘Irrelevant’ variables to be linked into the BN Build-BN result

Build-BN result TAN result Prediction Success Rate: 0.844 Prediction Success Rate: 0.653 Example II – TIC TAC TOENo constraints on the structure enables better prediction References: Tomi Silander, Petri Myllymaki, HIIT. A Simple Approach for Finding the Globally Optimal Bayesian Network Structure.

Two Approaches to Bayesian Network Structure Learning

Two Approaches to Bayesian Network Structure Learning

Presentation Transcript

Bayesian Network

Two approaches to psychology

BAYESIAN NETWORK

Bayesian Network

Bayesian and Connectionist Approaches to Learning

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Learning Bayesian Networks with Local Structure

Bayesian Network

V11: Structure Learning in Bayesian Networks

Learning Bayesian Network using Genetic Algorithms

Two Approaches to Modelling

Project Based Learning (PBL) Two Approaches

Bayesian Network

Learning Bayesian Network Structure from Massive Datasets: The ``Sparse Candidate'' Algorithm

Being Bayesian about Network Structure

Bayesian Approaches to Perception

Bayesian Network Structure Learning A Sequential Monte Carlo Approach

Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm

Introduction to Bayesian Learning

Structure Learning in Bayesian Networks

Being Bayesian About Network Structure