1 / 100

Inference in Bayesian Networks

Inference in Bayesian Networks. Agenda. Efficient inference in Bayesian Networks Reading off independence declarations Variable elimination Monte-Carlo methods. Burglary. Earthquake. causes. Alarm. effects. JohnCalls. MaryCalls. BN from Last Lecture.

dima
Télécharger la présentation

Inference in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference in Bayesian Networks

  2. Agenda • Efficient inference in Bayesian Networks • Reading off independence declarations • Variable elimination • Monte-Carlo methods

  3. Burglary Earthquake causes Alarm effects JohnCalls MaryCalls BN from Last Lecture Intuitive meaning of arc from x to y: “x has direct influence on y” Directed acyclic graph

  4. Burglary Earthquake Alarm JohnCalls MaryCalls BN from Last Lecture Size of the CPT for a node with k parents: 2k 10 probabilities, instead of 31

  5. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm)

  6. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Suppose we want to compute P(Alarm) P(Alarm) = Σb,eP(A,b,e) P(Alarm) = Σb,e P(A|b,e)P(b)P(e)

  7. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(Alarm) = Σb,eP(A,b,e) • P(Alarm) = Σb,e P(A|b,e)P(b)P(e) • P(Alarm) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)

  8. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference • Suppose we want to compute P(Alarm) • P(A) = Σb,eP(A,b,e) • P(A) = Σb,e P(A|b,e)P(b)P(e) • P(A) = P(A|B,E)P(B)P(E) + P(A|B, E)P(B)P(E) + P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E) • P(A) = 0.95*0.001*0.002 + 0.94*0.001*0.998 + 0.29*0.999*0.002 + 0.001*0.999*0.998 = 0.00252

  9. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls)

  10. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A)

  11. Burglary Earthquake Alarm JohnCalls MaryCalls Top-Down inference Now, suppose we want to compute P(MaryCalls) P(M) = P(M|A)P(A) + P(M|A) P(A) P(M) = 0.70*0.00252 + 0.01*(1-0.0252) = 0.0117

  12. Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)?

  13. Bayes’ Rule • P(AB) = P(A|B) P(B) = P(B|A) P(A) • So… P(A|B) = P(B|A) P(A) / P(B) • A convenient way to manipulate probability equations

  14. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)?

  15. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(B)? • P(B) = Sa P(B,A=a) [marginalization] • P(B,A=a) = P(B|A=a)P(A=a) [conditional probability] • So, P(B) = SaP(B | A=a) P(A=a)

  16. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)?

  17. Applying Bayes’ Rule • Let A be a cause, B be an effect, and let’s say we know P(B|A) and P(A) (conditional probability tables) • What’s P(A|B)? • P(A|B) = P(B|A)P(A)/P(B) [Bayes rule] • P(B) = SaP(B | A=a) P(A=a) [Last slide] • So, P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)]

  18. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) =

  19. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a?

  20. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [SaP(B=b | A=a) P(A=a)] Are these the same a? NO!

  21. How do we read this? • P(A|B) = P(B|A)P(A) / [SaP(B | A=a) P(A=a)] • [An equation that holds for all values A can take on, and all values B can take on] • P(A=a|B=b) = P(B=b|A=a)P(A=a) / [Sa’P(B=b | A=a’) P(A=a’)] Be careful about indices!

  22. Cavity Toothache Querying the BN • The BN gives P(T|C) • What about P(C|T)? • P(Cavity|Toothache) = P(Toothache|Cavity) P(Cavity) P(Toothache)[Bayes’ rule] • Querying a BN is just applying Bayes’ rule on a larger scale… Denominator computed by summing out numerator over Cavity and Cavity

  23. Arcs do not necessarily encode causality! A C B B C A 2 BN’s that encode the same joint probability distribution

  24. Reading off independence relationships • Given B, does the value of A affect the probability of C? • P(C|B,A) = P(C|B)? • No! • C parent’s (B) are given, and so it is independent of its non-descendents (A) • Independence is symmetric:C  A | B => A  C | B A B C

  25. Burglary Earthquake Alarm JohnCalls MaryCalls What does the BN encode? Burglary  Earthquake JohnCallsMaryCalls | Alarm JohnCalls Burglary | Alarm JohnCalls Earthquake | Alarm MaryCalls Burglary | Alarm MaryCalls Earthquake | Alarm A node is independent of its non-descendents, given its parents

  26. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary Earthquake | Alarm ? • No! Why?

  27. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | Alarm ? • No! Why? • P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 • P(B|A)P(E|A) = 0.086

  28. Burglary Earthquake Alarm JohnCalls MaryCalls Reading off independence relationships • How about Burglary  Earthquake | JohnCalls? • No! Why? • Knowing JohnCalls affects the probability of Alarm, which makes Burglary and Earthquake dependent

  29. Independence relationships • Rough intuition (this holds for tree-like graphs, polytrees): • Evidence on the (directed) road between two variables makes them independent • Evidence on an “A” node makes descendants independent • Evidence on a “V” node, or below the V, makes the ancestors of the variables dependent (otherwise they are independent) • Formal property in general case : D-separation  independence (see R&N)

  30. Performing Inference • Variables X • Have evidence set E=e, query variable Q • Want to compute the posterior probability distribution over Q, given E=e • Let the non-evidence variables be Y (= X \ E) • Straight forward method: • Compute joint P(YE=e) • Marginalize to get P(Q,E=e) • Divide by P(E=e) to get P(Q|E=e)

  31. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|M) = ?? Evidence E=e Query Q

  32. Burglary Earthquake Alarm P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi)) JohnCalls MaryCalls  full joint distribution table Inference in the Alarm Example P(J|MaryCalls) = ?? 24 entries 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)

  33. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 2 entries:one for JohnCalls,the other for JohnCalls 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)

  34. Burglary Earthquake Alarm JohnCalls MaryCalls Inference in the Alarm Example P(J|MaryCalls) = ?? 1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E) 2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls) 3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls) = P(J,MaryCalls)/(SjP(j,MaryCalls))

  35. How expensive? • P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi)) Straightforward method: • Use above to compute P(Y,E=e) • P(Q,E=e) = Sy1 … Syk P(Y,E=e) • P(E=e) = Sq P(Q,E=e) • Step 1: O( 2n-|E| ) entries! Normalization factor – no big deal once we have P(Q,E=e) Can we do better?

  36. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)

  37. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2)= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) Rearrange equation…

  38. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 Cache P(x2) for both values of X3!

  39. Variable Elimination • Consider linear network X1X2X3 • P(X) = P(X1) P(X2|X1) P(X3|X2) • P(X3) = Σx1Σx2 P(x1) P(x2|x1) P(X3|x2) = Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1) = Σx2 P(X3|x2) P(x2) Computed for each value of X2 • How many * and + saved? • *: 2*4*2=16 vs 4+4=8 • + 2*3=8 vs 2+1=3 Can lead to huge gains in larger networks

  40. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

  41. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

  42. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b) Compute for all values of E,b

  43. VE in Alarm Example • P(E|j,m)=P(E,j,m)/P(j,m) • P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a) = P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)= P(E) Σb P(b) P(j,m|E,b)= P(E) P(j,m|E) Compute for all values of E

  44. What order to perform VE? • For tree-like BNs (polytrees), order so parents come before children • # of variables in each intermediate probability table is 2^(# of parents of a node) • If the number of parents of a node is bounded, then VE is linear time! • Other networks: intermediate factors may become large

  45. Non-polytree networks • P(D) = ΣaΣbΣc P(A)P(B|A)P(C|A)P(D|B,C) = ΣbΣc P(D|B,C) Σa P(A)P(B|A)P(C|A) A No more simplifications… B C D

  46. Approximate Inference Techniques • Based on the idea of Monte Carlo simulation • Basic idea: • To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed • Conditional simulation: • To estimate the probability P(H) that a coin picked out of bucket B flips heads, I can: • Pick a coin C out of B (occurs with probability P(C)) • Flip C and observe whether it flips heads (occurs with probability P(H|C)) • Put C back and repeat from step 1 many times • Return the fraction of heads observed (estimate of P(H))

  47. Burglary Earthquake Alarm JohnCalls MaryCalls Approximate Inference: Monte-Carlo Simulation • Sample from the joint distribution B=0 E=0 A=0 J=1 M=0

  48. Approximate Inference: Monte-Carlo Simulation • As more samples are generated, the distribution of the samples approaches the joint distribution! B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0

  49. Approximate Inference: Monte-Carlo Simulation • Inference: given evidence E=e (e.g., J=1) • Remove the samples that conflict B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 Distribution of remaining samples approximates the conditional distribution!

  50. How many samples? • Error of estimate, for n samples, is on average • Variance-reduction techniques

More Related