Exact Inference

Exact Inference Eran Segal Weizmann Institute

Course Outline

Inference • Markov networks and Bayesian networks represent a joint probability distribution • Networks contain information needed to answer any query about the distribution • Inference is the process of answering such queries • Direction between variables does not restrict queries • Inference combines evidence from all network parts

Likelihood Queries • Compute probability (=likelihood) of the evidence • Evidence: subset of variables E and an assignment e • Task: compute P(E=e) • Computation

Conditional Probability Queries • Conditional probability queries • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute P(Y | E=e) • Applications • Medical and fault diagnosis • Genetic inheritance • Computation

Maximum A Posteriori Assignment • Maximum A Posteriori Assignment (MAP) • Evidence: subset of variables E and an assignment e • Query: a subset of variables Y • Task: compute MAP(Y|E=e) = argmaxy P(Y=y | E=e) • Note 1: there may be more than one possible solution • Note 2: equivalent to computing argmaxy P(Y=y , E=e) since P(Y=y | E=e) = P(Y=y , E=e) / P(E=e) • Computation

Most Probable Assignment: MPE • Most Probable Explanation (MPE) • Evidence: subset of variables E and an assignment e • Query: all other variables Y (Y=U-E) • Task: compute MPE(Y|E=e) = argmaxy P(Y=y | E=e) • Note: there may be more than one possible solution • Applications • Decoding messages: find the most likely transmitted bits • Diagnosis: find a single most likely consistent hypothesis

Most Probable Assignment: MPE • Note: We are searching for the most likely joint assignment to all variables • May be different than most likely assignment to each RV A • P(a1)>P(a0)  MAP(A) = a1 • MPE(A,B) = {a0, b1} • P(a0, b0) = 0.04 • P(a0, b1) = 0.36 • P(a1, b0) = 0.3 • P(a1, b1) = 0.3 B P(A) P(B|A)

Exact Inference in Graphical Models • Graphical models can be used to answer • Conditional probability queries • MAP queries • MPE queries • Naïve approach • Generate joint distribution • Depending on query, compute sum/max •  Exponential blowup • Exploit independencies for efficient inference

Complexity of Bayesnet Inference • Assume encoding specifies DAG structure • Assume CPD representation as table CPD • Decision problem: Given a network G, a variable X and a value xVal(X), decide whether PG(X=x)>0

Complexity of Bayesnet Inference • Theorem: Decision problem is NP-complete • Proof: • Decision problem is in NP: for an assignment e to all network variables, check whether X=x in e and P(e)>0 • Reduction from 3-SAT • Binary valued variables Q1,...,Qn • Clauses C1,...,Ck where Ci=Li,1Li,2Li,3 • Li,j for i=1,...k and j=1,2,3 is a literal which is Qi orQi • = C1,...,Ck • Decision problem: is there an assignment to Q1,...,Qn satisfying ? • Construct network such that P(X=1)>0 iff  satisfiable

Complexity of Bayesnet Inference ... • P(Qi=1)=0.5 • P(Ci=1 | Pa(Ci)) = Pa(Ci) • CPD of A1,...,Ak-2,X is a deterministic AND • P(X=1|q1,...qn)=1 iff q1,...qn satisfies  • P(X=1)>0 iff there is a satisfying assignment Q1 Q2 Q3 Qn ... C1 C2 C3 Ck ... ... A1 A2 X

Complexity of Bayesnet Inference • Easy to check • Polynomial number of variables • CPDs can be described by a small table (max 8 parameters) • P(X = 1) > 0 if and only if there exists a satisfying assignment to Q1,…,Qn • Conclusion: polynomial reduction of 3-SAT • Implications • Cannot find a general efficient procedure for all networks • Can find provably efficient procedures for particular families of networks • Exploit network structure and independencies • Dynamic programming

Approximate Inference • Rather than computing the exact answer, compute an approximation to it • Approximation metrics for computing P(y|e) • Estimate p has absolute error if |P(y|e)-p|  • Estimate p has relative error if p / (1+)  P(y|e) p(1+) • Absolute error is not very useful in probability distributions since often probabilities are small

Approximate Inference Complexity • Theorem: the following is NP-Hard • Given a network G over n variables, a variable X and a value xVal(X), find a number p that has relative error (n) for the query PG(X=x) • Proof: • Based on the hardness result for exact inference • We showed that computing PG(X=x)>0 is hard •  An algorithm that returns an estimate p to the original query would return p>0 iff PG(X=x)>0 •  The approximate inference with relative error is asNP-hard as the original exact inference problem

Approximate Inference Complexity • Theorem: the following is NP-Hard for <0.5 • Given a network G over n variables, a variable X and a value xVal(X), and observation eVal(E) for variables E find a number p that has absolute error  for PG(X=x|E=e) • Proof: • Consider the same construction for the network as above • Strategy of proof: show that given an approximation as above, we can determine satisfiability in polynomial time

Approximate Inference Complexity • Proof cont.: • Construction • Use approximate algorithm to compute the query P(Q1|X=1) • Assign Q1 to the value q which has higher posterior probability • Generate new network without Q1 and with modified CPDs • Repeat this process for all Qi • Claim: Assignment generated in the process satisfies  iff  has a satisfiable assignment • Proving the claim • Easy case: if  does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy  • Harder case: if  has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q

Approximate Inference Complexity • Proof cont.: • Proving the claim • Easy case: if  does not have a satisfiable assignment, then obviously the resulting assignment will not satisfy  • Harder case: if  has a satisfiable assignment we show that it has a satisfiable assignment with Q1=q • If  is satisfiable with both q and q then done • If  is not satisfiable with Q1=q, then P(Q1=q|X=1)=0 but then we have P(Q1=q|X=1)=1, and then since our approximation has absolute error <0.5, we will necessarily choose q which has a satisfying assignment • By induction on all Q variables, we have that the assignment we find must satisfy  • Construction process is polynomial

Inference Complexity Summary • NP-Hard • Exact inference • Approximate inference • with relative error • with absolute error < 0.5 (given evidence) • Hopeless? • No, we will see many network structures that have provably efficient algorithms and we will see cases when approximate inference works efficiently with high accuracy

Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) X1 X2 X3 All the numbers for this computation are in the CPDs of the original Bayesian network O(|X1||X2|) operations

Exact Inference Variable Elimination • Inference in a simple chain • Computing P(X2) • Computing P(X3) X1 X2 X3 • P(X3|X2) is a given CPD • P(X2) was computed above • O(|X1||X2|+|X2||X3|) operations

Exact Inference Variable Elimination ... • Inference in a general chain • Computing P(Xn) • Compute each P(Xi+1) from P(Xi) • k2 operations for each computation (assuming |Xi|=k) • O(nk2) operations for the inference • Compare to kn operations required in summing over all possible entries in the joint distribution over X1,...Xn • Inference in a general chain can be done in linear time! X1 X2 X3 Xn

Exact Inference Variable Elimination X1 X2 X3 X4 Pushing summations = Dynamic programming

Inference With a Loop X1 • Computing P(X4) X2 X3 X4

Efficient Inference in Bayesnets • Properties that allow us to avoid exponential blowup in the joint distribution • Bayesian network structure – some subexpressions depend on a small number of variables • Computing these subexpressions and caching the results avoids generating them exponentially many times

Variable Elimination • Inference algorithm defined in terms of factors • A factor is a function from value assignments of a set of random variables D to real positive numbers + • The set of variables D is the scope of the factor • Factors generalize the notion of CPDs • Thus, the algorithm we describe applies both to Bayesian networks and Markov networks

Variable Elimination: Factors • Let X, Y, Z be three sets of disjoint sets of RVs, and let 1(X,Y) and 2(Y,Z) be two factors • We define the factor product 1x2 operation to be a factor :Val(X,Y,Z)   as (X,Y,Z)=1(X,Y)2(Y,Z)

Variable Elimination: Factors • Let X be a set of RVs, YX a RV, and (X,Y) a factor • We define the factor marginalization of Y in X to be a factor :Val(X)   as (X)=Y(X,Y) • Also called summing out • In a Bayesian network, summing out all variables = 1 • In a Markov network, summing out all variables is the partition function

Variable Elimination: Factors • Factors are commutative • 1x2 = 1x2 • XY(X,Y) = YX(X,Y) • Products are associative • (1x2)x3 = 1x(2x3) • If XScope[1] (we used this in elimination above) • X1x2= 1xX2

Inference in Chain by Factors X1 X2 X3 X4 Scope of X3 and X4 does not contain X1 Scope of X4 does not contain X2

Sum-Product Inference • Let Y be the query RVs and Z be all other RVs • The general inference task is • Effective computation • Since scope factors is limited, “push in” some of the summations and perform them over the product of only a subset of factors

Sum-Product Variable Elimination • Algorithm • Sum out the variables one at a time • When summing out a variable multiply all the factors that mention the variable, generating a product factor • Sum out the variable from the combined factor, generating a new factor without the variable

Sum-Product Variable Elimination • Theorem • Let X be a set of RVs • Let F be a set of factors such that for each  Scope[]X • Let YX be a set of query RVs • Let Z=X-Y •  For any ordering over Z, the above algorithm returns a factor (Y) such that • Instantiation for Bayesian network query PG(Y) • F consists of all CPDs in G • Each Xi = P(Xi | Pa(Xi)) • Apply variable elimination for U-Y

A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: I,H,G,S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: H,G,S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: G,S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: S,L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: L • Compute: I D D G S L J H

A More Complex Network C • Goal: P(J) • Eliminate: G,I,S,L,H,C,D I D D G S L J H Note: intermediate factor large: f1(I,D,L,J,H)

Inference With Evidence • Let Y be the query RVs • Let E be the evidence RVs and e their assignment • Let Z be all other RVs (U-Y-E) • The general inference task is

Inference With Evidence C • Goal: P(J|H=h,I=i) • Eliminate: C,D,G,S,L • Below, compute f(J,H=h,I=i) I D D G S L J H

Complexity of Variable Elimination • Variable elimination consists of • Generating the factors that will be summed out • Summing out • Generating the factor fi=1 x,...x ki • Let Xi be the scope of fi • Each entry requires ki multiplications to generate •  Generating factor fi is O(ki|Val(Xi)|) • Summing out • Addition operations, at most |Val(Xi)| • Per factor: O(kN) where N=maxi|Val(Xi)|, k=maxiki

Complexity of Variable Elimination • Start with n factors (n=number of variables) • Generate exactly one factor at each iteration •  there are at most 2n factors • Generating factors • At most i|Val(Xi)|ki Niki  N2n (since each factor is multiplied in exactly once and there are 2n factors) • Summing out • i|Val(Xi)| Nn (since we have n summing outs to do) • Total work is linear in N and n • Exponential blowup can be in Ni which for factor i can be vm if factor i has m variables with v values each

VE as Graph Transformation • At each step we are computing • Plot a graph where there is an undirected edge X—Y if variables X and Y appear in the same factor • Note: this is the Markov network of the probability on the variables that were not eliminated yet

VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L I D D G S L J H

VE as Graph Transformation C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • Compute: I D D G S L J H

VE as Graph Transformation C • Goal: P(J) • Eliminate: D,I,H,G,S,L • Compute: I D G S L J H

Exact Inference

Exact Inference

Presentation Transcript

Exact and approximate inference in probabilistic graphical models

Exact Inference

Exact and Approximate Inference in Associative Hierarchical Networks using Graph Cuts

Exact Inference Continued

Exact Inference: Clique Trees

Exact and approximate inference in probabilistic graphical models

Exact Schemes

EXACT Connect

Inference: Fisher’s Exact p-values

“Exact”

Exact Inference on Graphical Models

Exact Inference in Bayes Nets

Max-Sum Exact Inference

Exact Matching

Exact Computation

Exact Analysis of Exact Change

PGM 2002/03 Tirgul 4 Exact Inference

Exact Numbers

Exact Schemes