IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS

IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS By Sonal Junnarkar Friday, 05 October 2001 REFERENCES 1) AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks -By Cheng and Druzdzel 2) Simulation Approaches To General Probabilistic Inference on Belief Networks - By R. D. Shachter and M. A. Peot

Theorem • P(h)  Prior Probability of Hypothesis h • Measures initial beliefs (BK) before any information is obtained (hence prior) • P(D)  Prior Probability of Training Data D • Measures probability of obtaining sample D (i.e., expresses D) • P(h | D)  Probability of h Given D • | denotes conditioning - hence P(h | D) is a conditional (aka posterior) probability • P(D | h)  Probability of D Given h • Measures probability of observing D given that h is correct (“generative” model) • P(h  D)  Joint Probability of h and D • Measures probability of observing D and of h being correct BAYES’S THEOREM

P(x3 | y) P(xn | y) P(x2 | y) P(x1 | y) Sprinkler: On, Off Ground: Wet, Dry Season: Spring Summer Fall Winter X2 X1 X4 X5 Ground: Slippery, Not-Slippery X3 Rain: None, Drizzle, Steady, Downpour Bayesian Networks: Model Uncertainty in Intelligent Systems. Examples: Simple Bayesian Network P(Y) “Sprinkler” BBN

WHAT IS SAMPLING? • SAMPLING is: • Generalization of results by selecting units from a population and studying the sample - POPULATION is the group of people, items or units under investigation • e.g. Analog (Continuous) to Digital (Discrete Signals) Conversion • IMPORTANCE SAMPLING: aka Biased Sampling • Probabilistic Sampling Method - any sampling method that utilizes some form of random selection • Advantage: - To reduce variance and errors in the result - Importance Function: finite-dimensional integral helps reducing the sampling variance.

GENERAL IMPORTANCE SAMPLING ALGORITHM • Order the nodes in topological order • Initialize Importance Function Pr0(X|E), total number of samples ‘m’, sampling interval ‘l’, and score arrays for every node. • K <- 0, T<- 0 • For I <- 1 to m do • if(I mod l == 0) then • k <- k + 1 • Update Importance Function Prk(X\E) based on T* • end if • Si <- Generate a sample according toPrk(X\E) • T <- T U {Si} • Calculate Score (Si, Pr(X\E,e), Prk(X\E) ) and add to the corresponding entry in score array according to instantiated states. • End for • Normalize the score arrays for each node.

TERMS IN IMPORTANCE SAMPLING Importance Function: Probability Density Function over the domain of a given system. Sample generation from this function. Probability Distribution over all the variables of a Bayesian network model, Pr(X) = Πi=1 to n Pr(Xi | Pa(Xi)) (Product of Probability of each node given its parents) Where Pa(Xi) – Parents of Node Xi Probability Distribution of Query Nodes (Nodes other than Evidence Nodes) Pr(X\E,E=e) = Πi=X\E Pr(Xi | Pa(Xi)) (\ : Set Difference)

TERMS IN IMPORTANCE SAMPLING contd….. Sample Score is calculated as Pr(X\E) = Pr(X) / Pr(X\E,E=e) Revised importance distribution == Approximation of to posterior probability In Self Importance Sampling Algorithm, this function is updated in Step 7 As Prn+1(X\E) α Prn(X\E) + Pr(X\E) Periodic revision the conditional probability tables(CPTs) in order to make sampling distribution gradually approach the posterior distribution. Importance Sampling is biased, why? The same data is used to update the Importance Function and to compute the estimator, this process introduces bias in the estimator.

INTRODUCING PARALLELIZATION INSIS • Different Techniques: • Using multiple threads for sample generation • If total samples = 100, then 10 samples per thread, also sampling interval = 10. • Problem with Updating Importance Distribution Function, • (Since updation of that function is done after each sampling interval.) • *Already Implemented in New Code • Calculating probabilities of independent nodes in parallel • Start from root node and for every sample generated, calculate probability of conditionally independent nodes1 simultaneously. • 1Conditionally Independent Nodes: not ancestors or descendents of each other.

Exposure-To-Toxics Serum Calcium X6 X1 X3 Age Cancer X5 X2 X4 X7 Gender Smoking Lung Tumor • Conditional Independence • Variable (node): conditionally independent of non-descendants given parents • Example • Result: chain rule for probabilistic inference • Bayesian Network: Probabilistic Semantics • Node: variable • Edge: one axis of a conditional probability table (CPT)

IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS

IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS

Presentation Transcript

Bayesian Networks

Bayesian Networks

Importance Sampling

Bayesian Networks

Bayesian Networks

Importance Sampling

Bayesian Networks

Bayesian Networks

Bayesian Networks: Sampling Algorithms for Approximate Inference

Bayesian networks

Bayesian networks

Sampling Bayesian Networks

Bayesian Networks Bucket Elimination Algorithm

Bayesian Networks

Bayesian Networks

Bayesian Networks

Sampling Bayesian Networks

Bayesian Networks

Bayesian Learning Algorithm

BAYESIAN NETWORKS

Sampling Bayesian Networks

Bayesian Networks