380 likes | 589 Vues
Reasoning Under Uncertainty: Independence and Inference. CPSC 322 Textbook §6.3.1, 6.5, 6.5.1, 6.5.2 March 26, 2012. Lecture Overview. Recap: Bayesian Networks Network structure Inference in General Bayesian Networks Observations and Inference Intro to variable elimination: Factors.
E N D
Reasoning Under Uncertainty: Independence and Inference CPSC 322 Textbook §6.3.1, 6.5, 6.5.1, 6.5.2 March 26, 2012
Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors
How to build a Bayesian network • Define a total order over the random variables: (X1, …,Xn) • If we apply the chain rule, we have P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1) • Define as parents of random variable Xi in the Belief network a minimal set of its predecessors Parents(Xi) such that • P(Xi | X1, … ,Xi-1) = P (Xi | Parents(Xi)) • Putting it all together, in a Belief network • P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents(Xi)) Predecessors of Xi in the total order defined over the variables Xi is conditionally independent from all its other predecessors given Parents(Xi) A Belief network defines a factorization over the JDP for its variables, based on existing conditional independencies among these variables
Example for BN construction: Fire Diagnosis You want to diagnose whether there is a fire in a building • You receive a noisy report about whether everyone is leaving the building • If everyone is leaving, this may have been caused by a fire alarm • If there is a fire alarm, it may have been caused by a fire or by tampering • If there is a fire, there may be smoke
Example for BN construction: Fire Diagnosis First you choose the variables. In this case, all are Boolean: • Tampering (T) is true when the alarm has been tampered with • Fire (F) is true when there is a fire • Alarm (A) is true when there is an alarm • Smoke (S) is true when there is smoke • Leaving (L) is true if there are lots of people leaving the building • Report (R) is true if the sensor reports that lots of people are leaving the building Then define a total ordering of variables: • Let’s say we order variables following causal relationships: • causes comes before their effects in the ordering • Fire; Tampering; Alarm; Smoke; Leaving; Report.
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R Fire (F) is the first variable in the ordering, X1. It does not have parents. Fire
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Tampering (T) independent of fire (learning that one is true would not change your beliefs about the probability of the other) Tampering Fire
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Alarm (A) depends on both Fire and Tampering: it could be caused by either or both Tampering Fire Alarm
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Smoke (S) is caused by Fire, and so is independent of Tampering and Alarm given whether there is a Fire Tampering Fire Alarm Smoke
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Leaving (L) is caused by Alarm, and thus is independent of the other variables given Alarm Tampering Fire Alarm Smoke Leaving
Example • Let’s add nodes to the network in the following order: F, T, A, S, L, R • Report ( R) is caused by Leaving, and thus is independent of the other variables given Leaving Tampering Fire Alarm Smoke Leaving Report
Example for BN construction: Fire Diagnosis • We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. • How many probabilities do we need to specify for this Bayesian network?
Example for BN construction: Fire Diagnosis • For each binary variable V, we only need to store P(V= t | parents(V)). • - with k parents, 1 probability for each of the 2kinstantiations of the parents • We don’t need to store P(V= f |parents(V)) since the two probabilities sum to 1 Each row of these tables is a conditional probability distribution
Example for BN construction: Fire Diagnosis • We are not done yet: must specify the Conditional Probability Table (CPT) for each variable. All variables are Boolean. • How many probabilities do we need to specify for this Bayesian network? • In total: 1+1+4+2+2+2 = 12 (compared to 26 -1= 63 for full JPD!)
Example for BN construction: Fire Diagnosis P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) = P(Tampering=t) x P(Fire=f) x P(Alarm=t| Tampering=t, Fire=f) x x P(Smoke=f| Fire = f) x P(Leaving=t| Alarm=t) x P(Report=t|Leaving=t) = = 0.02 x (1-0.01) x 0.85 x (1-0.01) x 0.88 x 0.75 = 0.126 Once we have the CPTs in the network, we can compute any entry of the JPD
In Summary • A Belief network is such that, the JPD of the variables involved is defined as the product of the local conditional distributions: P (X1, … ,Xn) = ∏iP(Xi | X1, … ,Xi-1)= ∏ iP (Xi | Parents(Xi)) • Can compute any entry of the JPD given the CPTs in the network P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) = P(Tampering=t) x P(Fire=f) x P(Alarm=t| Tampering=t, Fire=f) x x P(Smoke=f| Fire = f) x P(Leaving=t| Alarm=t) x P(Report=t|Leaving=t) = = 0.02 x (1-0.01) x 0.85 x (1-0.01) x 0.88 x 0.75 = 0.126 Once we know the JPD, we can answer any query about any subset of the variables - (see Inference by Enumeration topic) Thus, a Belief network allows one to answer any query on any subset of the variables We will see how soon, but first let’s think a bit more about network structure
Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors
Compactness • A CPT for a Boolean variable Xiwith k Boolean parents has 2k rows for the combinations of parent values • If each variable has no more than k parents, the complete network requires to specify O(n · 2k)numbers • For k<< n, this is a substantial improvement, • the numbers required grow linearly with n, vs. O(2n)for the full joint distribution • E.g., if we have a Bnets with 30 boolean variables, each with 5 parents • Need to specify 30 x 25probability • But we need 230 for JPD • Bnets are most useful in sparse (or locally structured) domains • Domains in with each component interacts with (is related to) a small fraction of other components
What if we use a different ordering? • Important for assignment 4, question 4: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? The previous structure This structure
What if we use a different ordering? • Important for assignment 4, question 2: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? • In the last network, we had to specify 12 probabilities • Here?
What if we use a different ordering? • Important for assignment 4, question 2: • Say, we use the following order: • Leaving; Tampering; Report; Smoke; Alarm; Fire. Tampering Leaving Alarm Report Smoke Fire • We end up with a completely different network structure! • Which of the two structures is better? • In the last network, we had to specify 12 probabilities • Here? 1 + 2 + 2 + 2 + 8 + 8 = 23 • The causal structure typically leads to the most compact network • Compactness typically enables more efficient reasoning • Deciding conditional independence is harder in non-causal direction • - Causal models seem hardwired for humans
Are there wrong network structures? • Important for assignment 4, question 2 • Some variable orderings yield more compact, some less compact structures • Compact ones are better • But all representations resulting from this process are correct • One extreme: the fully connected network is always correct but rarely the best choice. • Simply a representation of the chain rule with no simplifications for conditional independencies. • How can a network structure be wrong? • If it misses directed edges that are required • E.g. an edge is missing below: Fire ╨ Alarm | {Tampering, Smoke} Tampering Leaving Alarm Report Smoke Fire
Recap of BN construction with a small example Symptom Disease
Recap of BN construction with a small example • Which (conditional) probability tables do we need? P(D) P(D|S) P(S|D) P(D,S) Symptom Disease
Recap of BN construction with a small example • Which conditional probability tables do we need? • P(D) and P(S|D) • In general: for each variable X in the network: P( X|Pa(X) ) Symptom Disease
Recap of BN construction with a small example • How about a different ordering? Symptom, Disease • We need distributions P(S) and P(D|S) • In general: for each variable X in the network: P( X|Pa(X) ) Symptom Disease
Structure (contd.) • In general, the direction of a direct dependency can always be changed using Bayes rule P(a | b) = P(b | a) P(a) / P(b) • So the two simple Bnets below are equivalent as long as the CPTs are related via Bayes rule • Which structure to chose depends, among other things, on which CPT it is easier to specify Disease Symptom P(S | D) = P(D | S) P(S) / PD) Symptom Disease
“Where do the numbers (CPTs) come from?” • From experts • Tedious • Costly • Not always reliable • From data => Machine Learning • There are algorithms to learn both structures and numbers (CPSC 340, CPSC 422) • Can be hard to get enough data • Still, usually better than specifying the full JPD
Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors
Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) We denote observed variables as shaded. Examples: UnderstoodMaterial ExamGrade Assignment Grade Fire Smoking At Sensor Alarm
Bayesian Networks: Types of Inference Predictive Intercausal Diagnostic Mixed Fire happens F=t There is no fireF=f Person smokes next to sensorS=t Fire Fire Fire P(F|A=t,T=t)=? P(F|L=t)=? Smoking at Sensor Fire Alarm Alarm Alarm P(A|F=f,L=t)=? Alarm Leaving Leaving Leaving Alarm goes off P(a) = 1.0 People are leaving L=t People are leavingL=t P(L|F=t)=? We will use the same reasoning procedure for all of these types
Bayesian Networks - AISpace • Try similar queries with the Belief Networks applet in AISpace, • Load the Fire Alarm example (ex. 6.10 in textbook) • Compare the probability of each query node before and after the new evidence is observed • try first to predict how they should change using your understanding of the domain • Make sure you explore and understand the various other example networks in textbook and Aispace • Electrical Circuit example (textbook ex 6.11) • Patient’s wheezing and coughing example (ex. 6.14) • Not in applet, but you can easily insert it • Several other examples in AIspace
Inference in Bayesian Networks Given: • A Bayesian Network BN, and • Observations of a subset of its variables E: E=e • A subset of its variables Y that is queried Compute: The conditional probability P(Y|E=e) • Remember? We can already do this • Compute any entry of the JPD given the CPTs in the network, • then do Inference by Enumeration • BUT that’s extremely inefficient - does not scale. • Variable Elimination (VE) is an algorithm to perform inference in Bayesian networks more efficiently • It manipulates conditional probabilities in the form of “factors”. • So first we have to introduce factors and the operations we can perform on them.
Lecture Overview • Recap: Bayesian Networks • Network structure • Inference in General Bayesian Networks • Observations and Inference • Intro to variable elimination: Factors -> Next Tine