1 / 38

Reasoning Under Uncertainty: Independence and Inference

Reasoning Under Uncertainty: Independence and Inference. CPSC 322 Textbook §6.3.1, 6.5, 6.5.1, 6.5.2 March 26, 2012. Lecture Overview. Recap: Bayesian Networks Network structure Inference in General Bayesian Networks Observations and Inference Intro to variable elimination: Factors.

neron
Télécharger la présentation

Reasoning Under Uncertainty: Independence and Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reasoning Under Uncertainty: Independence and Inference CPSC 322 Textbook §6.3.1, 6.5, 6.5.1, 6.5.2 March 26, 2012

  2. Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors

  3. Recap: Conditional Independence

  4. Recap: Bayesian Networks, Definition

  5. How to build a Bayesian network • Define a total order over the random variables: (X1, …,Xn) • If we apply the chain rule, we have P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1) • Define as parents of random variable Xi in the Belief network a minimal set of its predecessors Parents(Xi) such that • P(Xi | X1, … ,Xi-1) = P (Xi | Parents(Xi)) • Putting it all together, in a Belief network • P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents(Xi)) Predecessors of Xi in the total order defined over the variables Xi is conditionally independent from all its other predecessors given Parents(Xi) A Belief network defines a factorization over the JDP for its variables, based on existing conditional independencies among these variables

  6. Example for BN construction: Fire Diagnosis You want to diagnose whether there is a fire in a building • You receive a noisy report about whether everyone is leaving the building • If everyone is leaving, this may have been caused by a fire alarm • If there is a fire alarm, it may have been caused by a fire or by tampering • If there is a fire, there may be smoke

  7. Example for BN construction: Fire Diagnosis First you choose the variables. In this case, all are Boolean: • Tampering (T) is true when the alarm has been tampered with • Fire (F) is true when there is a fire • Alarm (A) is true when there is an alarm • Smoke (S) is true when there is smoke • Leaving (L) is true if there are lots of people leaving the building • Report (R) is true if the sensor reports that lots of people are leaving the building Then define a total ordering of variables: • Let’s say we order variables following causal relationships: • causes comes before their effects in the ordering • Fire; Tampering; Alarm; Smoke; Leaving; Report.

  8. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R Fire (F) is the first variable in the ordering, X1. It does not have parents. Fire

  9. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Tampering (T) independent of fire (learning that one is true would not change your beliefs about the probability of the other) Tampering Fire

  10. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Alarm (A) depends on both Fire and Tampering: it could be caused by either or both Tampering Fire Alarm

  11. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Smoke (S) is caused by Fire, and so is independent of Tampering and Alarm given whether there is a Fire Tampering Fire Alarm Smoke

  12. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Leaving (L) is caused by Alarm, and thus is independent of the other variables given Alarm Tampering Fire Alarm Smoke Leaving

  13. Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Report ( R) is caused by Leaving, and thus is independent of the other variables given Leaving Tampering Fire Alarm Smoke Leaving Report

  14. Example for BN construction: Fire Diagnosis

  15. Example for BN construction: Fire Diagnosis • We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. • How many probabilities do we need to specify for this Bayesian network?

  16. Example for BN construction: Fire Diagnosis • For each binary variable V, we only need to store P(V= t | parents(V)). • - with k parents, 1 probability for each of the 2kinstantiations of the parents • We don’t need to store P(V= f |parents(V)) since the two probabilities sum to 1 Each row of these tables is a conditional probability distribution

  17. Example for BN construction: Fire Diagnosis • We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. • How many probabilities do we need to specify for this Bayesian network? • In total: 1+1+4+2+2+2 = 12 (compared to 26 -1= 63 for full JPD!)

  18. Example for BN construction: Fire Diagnosis P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) = P(Tampering=t) x P(Fire=f) x P(Alarm=t| Tampering=t, Fire=f) x x P(Smoke=f| Fire = f) x P(Leaving=t| Alarm=t) x P(Report=t|Leaving=t) = = 0.02 x (1-0.01) x 0.85 x (1-0.01) x 0.88 x 0.75 = 0.126 Once we have the CPTs in the network, we can compute any entry of the JPD

  19. In Summary • A Belief network is such that, the JPD of the variables involved is defined as the product of the local conditional distributions: P (X1, … ,Xn) = ∏iP(Xi | X1, … ,Xi-1)= ∏ iP (Xi | Parents(Xi)) • Can compute any entry of the JPD given the CPTs in the network P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) = P(Tampering=t) x P(Fire=f) x P(Alarm=t| Tampering=t, Fire=f) x x P(Smoke=f| Fire = f) x P(Leaving=t| Alarm=t) x P(Report=t|Leaving=t) = = 0.02 x (1-0.01) x 0.85 x (1-0.01) x 0.88 x 0.75 = 0.126 Once we know the JPD, we can answer any query about any subset of the variables - (see Inference by Enumeration topic) Thus, a Belief network allows one to answer any query on any subset of the variables We will see how soon, but first let’s think a bit more about network structure

  20. Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors

  21. Compactness • A CPT for a Boolean variable Xiwith k Boolean parents has 2k rows for the combinations of parent values • If each variable has no more than k parents, the complete network requires to specify O(n · 2k)numbers • For k<< n, this is a substantial improvement, • the numbers required grow linearly with n, vs. O(2n)for the full joint distribution • E.g., if we have a Bnets with 30 boolean variables, each with 5 parents • Need to specify 30 x 25probability • But we need 230 for JPD • Bnets are most useful in sparse (or locally structured) domains • Domains in with each component interacts with (is related to) a small fraction of other components

  22. What if we use a different ordering? • Important for assignment 4, question 4: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? The previous structure This structure

  23. What if we use a different ordering? • Important for assignment 4, question 2: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? • In the last network, we had to specify 12 probabilities • Here?

  24. What if we use a different ordering? • Important for assignment 4, question 2: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? • In the last network, we had to specify 12 probabilities • Here? 1 + 2 + 2 + 2 + 8 + 8 = 23 • The causal structure typically leads to the most compact network • Compactness typically enables more efficient reasoning • Deciding conditional independence is harder in non-causal direction • - Causal models seem hardwired for humans

  25. Are there wrong network structures? • Important for assignment 4, question 2 • Some variable orderings yield more compact, some less compact structures • Compact ones are better • But all representations resulting from this process are correct • One extreme: the fully connected network is always correct but rarely the best choice. • Simply a representation of the chain rule with no simplifications for conditional independencies. • How can a network structure be wrong? • If it misses directed edges that are required • E.g. an edge is missing below: Fire ╨ Alarm | {Tampering, Smoke} Tampering Leaving Alarm Report Smoke Fire

  26. BN construction: another small example

  27. Recap of BN construction with a small example Symptom Disease

  28. Recap of BN construction with a small example • Which (conditional) probability tables do we need? P(D) P(D|S) P(S|D) P(D,S) Symptom Disease

  29. Recap of BN construction with a small example • Which conditional probability tables do we need? • P(D) and P(S|D) • In general: for each variable X in the network: P( X|Pa(X) ) Symptom Disease

  30. Recap of BN construction with a small example • How about a different ordering? Symptom, Disease • We need distributions P(S) and P(D|S) • In general: for each variable X in the network: P( X|Pa(X) ) Symptom Disease

  31. Structure (contd.) • In general, the direction of a direct dependency can always be changed using Bayes rule P(a | b) = P(b | a) P(a) / P(b) • So the two simple Bnets below are equivalent as long as the CPTs are related via Bayes rule • Which structure to chose depends, among other things, on which CPT it is easier to specify Disease Symptom P(S | D) = P(D | S) P(S) / PD) Symptom Disease

  32. “Where do the numbers (CPTs) come from?” • From experts • Tedious • Costly • Not always reliable • From data => Machine Learning • There are algorithms to learn both structures and numbers (CPSC 340, CPSC 422) • Can be hard to get enough data • Still, usually better than specifying the full JPD

  33. Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors

  34. Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) We denote observed variables as shaded. Examples: UnderstoodMaterial ExamGrade Assignment Grade Fire Smoking At Sensor Alarm

  35. Bayesian Networks: Types of Inference Predictive Intercausal Diagnostic Mixed Fire happens F=t There is no fireF=f Person smokes next to sensorS=t Fire Fire Fire P(F|A=t,T=t)=? P(F|L=t)=? Smoking at Sensor Fire Alarm Alarm Alarm P(A|F=f,L=t)=? Alarm Leaving Leaving Leaving Alarm goes off P(a) = 1.0 People are leaving L=t People are leavingL=t P(L|F=t)=? We will use the same reasoning procedure for all of these types

  36. Bayesian Networks - AISpace • Try similar queries with the Belief Networks applet in AISpace, • Load the Fire Alarm example (ex. 6.10 in textbook) • Compare the probability of each query node before and after the new evidence is observed • try first to predict how they should change using your understanding of the domain • Make sure you explore and understand the various other example networks in textbook and Aispace • Electrical Circuit example (textbook ex 6.11) • Patient’s wheezing and coughing example (ex. 6.14) • Not in applet, but you can easily insert it • Several other examples in AIspace

  37. Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) • Remember? We can already do this • Compute any entry of the JPD given the CPTs in the network, • then do Inference by Enumeration • BUT that’s extremely inefficient - does not scale. • Variable Elimination (VE) is an algorithm to perform inference in Bayesian networks more efficiently • It manipulates conditional probabilities in the form of “factors”. • So first we have to introduce factors and the operations we can perform on them.

  38. Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors -> Next Tine

More Related