Bayesian Networks

Bayesian NetworksLecture 9 . Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.

S V L T B A X D Bayesian Network p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a) Bayesian network = Directed Acyclic Graph (DAG), annotated with conditional probability distributions.

S V L T B A X D p(s) p(v) p(t|v) p(l|s) p(b|s) p(a|t,l) p(d|a,b) p(x|a) Bayesian Network (cont.) Each Directed Acyclic Graph defines a factorization of the form:

Si3f Li2f y2 Xi2 Li2m Li3f Xi3 Li3m Y3 Li1f Xi1 Y1 Li1m Si3m Locus 2 (Disease) Locus 3 Locus 4 Locus 1 “Example” This model depicts the qualitative relations between the variables. We will now specify the joint distribution over these variables.

Smoking Visit to Asia Tuberculosis Lung Cancer Abnormality in Chest Bronchitis Dyspnea X-Ray The “Visit-to-Asia” Example

S V L T B A X D Queries There are many types of queries. Most queries involve evidence An evidence e is an assignment of values to a set E ofvariables in the domain P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes) P(Smoking=Yes| Dyspnea = Yes )

Queries: A posteriori belief The conditional probability of a variable given the evidence This is the a posteriori belief in x, given evidence e Often we compute the term P(x, e) from which we can recover the a posteriori belief by Examples given in previous slide.

S V L T B A X D A posteriori belief This query is useful in many other cases: • Prediction: what is the probability of an outcome given the starting condition • Target is a descendent of the evidence (e.g., Does a visit to Asia lead to Tuberculosis ?) • Diagnosis: what is the probability of disease/fault given symptoms • Target is an ancestor of the evidence (e.g., Does the X-ray results indicate higher probability of Tuberculosis ?)

S V L T B A X D Example: Predictive+Diagnostic Probabilistic inference can combine evidence form all parts of the network, Diagnostic and Predictive, regardless of the directions of edges in the model. P(T = Yes | Visit_to_Asia = Yes, Dyspnea = Yes )

Queries: MAP • Find the maximum a posteriori assignment for some variable of interest (say H1,…,Hl ) • That is, h1,…,hl maximize the conditional probabilityP(h1,…,hl | e) • Equivalent to maximizing the joint P(h1,…,hl, e)

D2 D1 D4 D3 S2 S1 S4 S3 Bad alternator Bad magneto Not charging Bad battery Dead battery Queries: MAP We can use MAP for: • Explanation • What is the most likely joint event, given the evidence (e.g., a set of likely diseases given the symptoms) • What is the most likely scenario, given the evidence (e.g., a series of likely malfunctions that trigger a fault).

Complexity of Inference Thm: Computing P(X = x) in a Bayesian network is NP-hard Not surprising, since we can simulate Boolean gates.

Proof We reduce 3-SAT to Bayesian network computation Assume we are given a 3-SAT problem: • Q1,…,Qn be propositions, • 1 ,... ,k be clauses, such that i = li1 li2  li3 where each lij is a literal over Q1,…,Qn (e.g.,Q1 = true) •  = 1... k We will construct a Bayesian network s.t. P(X=t) > 0 iff  is satisfiable

... P(Qi = true) = 0.5, P(I = true| Qi , Qj , Ql ) = 1 iff Qi , Qj , Qlsatisfy the clause I A1, A2, …, are simple binary AND gates Q1 Q2 Q3 Q4 Qn ... k-1 k 1 2 3 ... X A2 Ak-2 A1

It is easy to check • Polynomial number of variables • Each Conditional Probability Table can be described by a small table (8 parameters at most) • P(X = true) > 0 if and only if there exists a satisfying assignment to Q1,…,Qn • Conclusion: polynomial reduction of 3-SAT

Inference is even #P-hard • P(X = t) is the fraction of satisfying assignments to  • Hence 2nP(X = t) is the number of satisfying assignments to  • Thus, if we know to compute P(X = t), we know to count the number of satisfying assignments to. • Consequently, computing P(X = t) is #P-hard.

Hardness - Notes • We used deterministic relations in our construction • The same construction works if we use (1-, ) instead of (1,0) in each gate for any  < 0.5 Homework: Prove it. • Hardness does not mean we cannot solve inference • It implies that we cannot find a general procedure that works efficiently for all networks • For particular families of networks, we can have provably efficient procedures (e.g., trees, HMMs). • Variable elimination algorithms.

Approximation • Until now, we examined exact computation • In many applications, approximation are sufficient • Example: P(X = x|e) = 0.3183098861838 • Maybe P(X = x|e)  0.3 is a good enough approximation • e.g., we take action only if P(X = x|e) > 0.5 • Can we find good approximation algorithms?

Types of Approximations Absolute error • An estimate q of P(X = x | e) has absolute error , if P(X = x|e) -   q  P(X = x|e) +  equivalently q -   P(X = x|e) q +  • Absolute error is not always what we want: • If P(X = x | e) = 0.0001, then an absolute error of 0.001 is unacceptable • If P(X = x | e) = 0.3, then an absolute error of 0.001 is overly precise 1 q 2 0

Types of Approximations Relative error • An estimate q of P(X = x | e) has relative error , if P(X = x|e)(1 - )  q  P(X = x|e)(1 + ) equivalently q/(1 + )  P(X = x|e)  q/(1 - ) • Sensitivity of approximation depends on actual value of desired result 1 q/(1-) q q/(1+) 0

Complexity • Exact inference is hard. • Is approximate inference any easier?

Complexity: Relative Error • Suppose that q is a relative error estimate ofP(X = t), • If  is not satisfiable, then P(X = t)=0. Hence, 0 = P(X = t)(1 - )  q  P(X = t)(1 + ) = 0 namely, q=0. Thus, if q > 0, then  is satisfiable An immediate consequence: Thm: Given , finding an -relative error approximation is NP-hard

Complexity: Absolute error • We can find absolute error approximations to P(X = x) with high probability (via sampling). • We will see such algorithms next class. • However, once we have evidence, the problem is harder Thm • If  < 0.5, then finding an estimate of P(X=x|e) with  absulote error approximation is NP-Hard

... Q1 Q2 Q3 Q4 Qn ... k-1 k 1 2 3 ... ... A1 X A2 Proof • Recall our construction

Proof (cont.) • Suppose we can estimate with  absolute error • Let p1 P(Q1 = t | X = t) • Assign q1 = t if p1 > 0.5, else q1 = f • Let p2 P(Q2 = t | X = t, Q1 = q1 ) • Assign q2 = t if p2 > 0.5, else q2 = f • … • Let pn P(Qn = t | X = t, Q1 = q1, …, Qn-1 = qn-1 ) • Assign qn = t if pn > 0.5, else qn = f

Proof (cont.) Claim: if  is satisfiable, then q1,…,qn is a satisfying assignment • Suppose  is satisfiable • By induction on i there is a satisfying assignment with Q1 = q1, …, Qi = qi • Base case: • If Q1 = t in all satisfying assignments, • P(Q1 = t | X = t) = 1 • p1 1 -  > 0.5 •  q1 = t • If Q1 = f, in all satisfying assignments, then q1 = f • Otherwise, the statement holds for any choice of q1

Proof (cont.) Claim: if  is satisfiable, then q1,…,qn is a satisfying assignment • Suppose  is satisfiable • By induction on i there is a satisfying assignment with Q1 = q1, …, Qi = qi • Induction argument: • If Qi+1 = t in all satisfying assignments s.t.Q1 = q1, …, Qi = qi • P(Qi+1 = t | X = t, Q1 = q1, …, Qi = qi ) = 1 • pi+1 1 -  > 0.5 •  qi+1 = t • If Qi+1 = f in all satisfying assignments s.t.Q1 = q1, …, Qi = qi then qi+1 = f

Proof (cont.) • We can efficiently check whether q1,…,qn is a satisfying assignment (linear time) • If it is, then  is satisfiable • If it is not, then  is not satisfiable • Suppose we have an approximation procedure with  relative error •  we can determine 3-SAT with n procedure calls •  approximation is NP-hard

Bayesian Networks