1 / 35

Exact Inference: Clique Trees

Exact Inference: Clique Trees. Eran Segal Weizmann Institute. Inference with Clique Trees. Exploits factorization of the distribution for efficient inference, similar to variable elimination Uses global data structures Distribution given by (possibly unnormalized) measure

aurorette
Télécharger la présentation

Exact Inference: Clique Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exact Inference:Clique Trees Eran Segal Weizmann Institute

  2. Inference with Clique Trees • Exploits factorization of the distribution for efficient inference, similar to variable elimination • Uses global data structures • Distribution given by (possibly unnormalized) measure • For Bayesian networks, factors are CPDs • For Markov networks, factors are potentials

  3. Variable Elimination & Clique Trees • Variable elimination • Each step creates a factor i through multiplication • A variable is then eliminated in i to generate new factor j • Process repeated until product contains only query variables • Clique tree inference • Another view of the above computation • General idea: i is a computational data structure which takes “messages” j generated by other factors j and generates a message i which is used by another factor l

  4. Cluster Graph • Data structure providing flowchart of the factor manipulation process • A cluster graph K for factors F is an undirected graph • Nodes are associated with a subset of variablesCiU • The graph is family preserving: each factor F is associated with one node Cisuch that Scope[]Ci • Each edge Ci–Cj is associated with a sepsetSi,j= Ci Cj • Key: variable elimination defines a cluster graph • Cluster Ci for each factor i used in the computation • Draw edge Ci–Cj if the factor generated from i is used in the computation of j

  5. Cluster graph C1 = {X1,X2} C2 = {X2,X3} S1,2= {X2} X2 X1,X2 X2,X3 P(X1) P(X2|X1) P(X3|X2) Simple Example X1 X2 X3 Variable elimination

  6. A More Complex Example C • Goal: P(J) • Eliminate: C,D,I,H,G,S,L • C: • D: • I: • H: • G: • S: • L: I D G S L J H D G,I 3 C,D G,I,D G,S,I 2 1 G,S J,S,L J,L G,J,S,L J,S,L J,L 5 7 6 G,J 4 H,G,J

  7. D G,I C,D G,I,D G,S,I G,S • Verify: • Tree and family preserving • Running intersection property J,S,L J,L G,J,S,L J,S,L J,L G,J H,G,J Properties of Cluster Graphs • Cluster graphs are trees • In VE, each intermediate factor is used only once • Hence, each cluster “passes” an edge (message) to exactly one other cluster • Cluster graphs obey the running intersection property • If XCi and XCj then X is in each cluster in the (unique) path between Ci and Cj

  8. Running Intersection Property • Theorem: If T is a cluster tree induced by VE over factors F, then T obeys the running intersection • Proof: • Let C and C’ be two clusters that contain X • Let CX be the cluster where X is eliminated •  X must be present on each cluster on C to CX path • Computation at CX must be after computation at C • X is in C by assumption and since X is not eliminated in C, then X is in the factor generated by C • By definition, C’s neighbor multiplies factor generated by C and thus multiplies X and has X in its scope • By induction for all other nodes on the path •  X appears in all cliques between C and CX

  9. Clique Tree • A cluster tree over factors F that satisfies the running intersection property is called a clique tree • Clusters in a clique tree are also called cliques • We saw, variable elimination  clique tree • Now we will see clique tree  variable elimination • Clique tree advantage: data structure for caching computations allowing multiple VE runs to be performed more efficiently than separate VE runs

  10. D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J • Verify: • Tree and family preserving • Running intersection property 2 3 5 4 1 P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S) Clique Tree Inference C • Goal: Compute P(J) I D G S L J H

  11. D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S) Clique Tree Inference C • Goal: Compute P(J) • Set initial factors at each cluster as products • C1: Eliminate C, sending 12(D) to C2 • C2: Eliminate D, sending 23(G,I) to C3 • C3: Eliminate I, sending 35(G,S) to C5 • C4: Eliminate H, sending 45(G,J) to C5 • C5: Obtain P(J) by summing out G,S,L I D G S L J H 2 3 5 4 1

  12. D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S) Clique Tree Inference C • Goal: Compute P(J) • Set initial factors at each cluster as products • C1: Eliminate C, sending 12(D) to C2 • C2: Eliminate D, sending 23(G,I) to C3 • C3: Eliminate I, sending 35(G,S) to C5 • C5: Eliminate S,L, sending 54(G,J) to C4 • C4: Obtain P(J) by summing out H,G I D G S L J H 2 3 5 4 1

  13. Clique Tree Inference C I D G S L J H D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J 2 3 5 4 1 P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S)

  14. Clique Tree Message Passing • Let T be a clique tree and C1,...Ck its cliques • Multiply factors of each clique, resulting in initial potentials as each factor is assigned to some clique () then we have and • Define Cr as the root cluster • Start from tree leaves and move inward • For each clique Ci define Ni to be the neighbors of Ci • Let pr(i) be the upstream neighbor of i (on the path to Cr) • Each Ci performs a computation that sends message to Cpr(i) • Multiply all incoming messages from downstream neighbors with the initial clique potential resulting in a factor whose scope is the clique • Sum out all variables except those in the sepset Ci—Cpr(i) • Claim: the final clique potential r[Cr] represents

  15. Example • Root C6 • Legal ordering I: 1,2,3,4,5,6 • Legal ordering II: 2,5,1,3,4,6 • Illegal ordering: 3,4,1,2,5,6 C1 C4 C6 C5 C3 C2

  16. Clique Tree Inference Correctness • Theorem • Let Cr by the root clique in a clique tree • If r is computed as above, then • Algorithm applies to Bayesian and Markov networks • For Bayesian network G, if F consists of the CPDs reduced with some evidence e then r[Cr] = PG(Cr,e) • Probability obtained by normalizing the factor over Cr to sum to 1 • For Markov network H, if F consists of the compatibility functions then r[Cr] = PH(Cr) • Probability obtained by normalizing the factor over Cr to sum to 1 • Partition function obtained by summing up all entries in r[Cr]

  17. Clique Tree Calibration • Assume we want to compute posterior distributions over n variables • With variable elimination, we perform n separate VE runs • With clique trees, we can do this much more efficiently • Idea 1: since marginal over a variable can be computed from any root clique that includes it, perform k clique tree runs (k=# cliques) • Idea 2: Can do much better!

  18. Clique Tree Calibration • Observation: a message from Ci to Cj is unique • Consider two neighboring cliques Ci and Cj • If root Cr is on Cj side, Ci sends Cj a message • Message does not depend on specific Cr (we only need Cr to be on the Cj side for Ci to send a message to Cj) •  Message from Ci to Cj will always be the same •  Each edge has two messages associated with it • One message for each direction of the edge • There are only 2(k-1) messages to compute • Can then readily compute posterior over each variable

  19. Clique Tree Calibration • Compute 2(k-1) messages by • Pick any node as the root • Upward pass: send messages to the root • Terminate when root received all messages • Downward pass: send messages to root children • Terminate when all leaves received messages

  20. Clique Tree Calibration: Example C • Root: C5 (first downward pass) I D G S L J H D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J 2 3 5 4 1 P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S)

  21. Clique Tree Calibration: Example C • Root: C5 (second downward pass) I D G S L J H D G,I G,S G,J C,D G,I,D G,S,I G,J,S,L H,G,J 2 3 5 4 1 P(G|I,D) P(H|G,J) P(C) P(I) P(L|G) P(D|C) P(S|I) P(J|L,S)

  22. Clique Tree Calibration • Theorem • i is computed for each clique i as above  • Important: avoid double-counting! • Each node i computes the message to its neighbor j using its initial potentials and not its updated potential i, since j integrates information from Cj which will be counted twice

  23. Clique Tree Calibration • The posterior of each node X can be computed by eliminating the redundant variables from a clique that contains X • If X appears in multiple cliques, they must agree • A clique tree with potentials i[Ci] is said to be calibrated if for all neighboring cliques Ci andCj: • Key advantage: compute posteriors for all variables using only twice the computation of one upward pass

  24. Distribution of Calibrated Tree • For calibrated tree • Joint distribution can thus be written as Bayesian network Clique tree B A B C A,B B,C

  25. Distribution of Calibrated Tree • The clique tree measure of a calibrated tree T is where

  26. Distribution of Calibrated Tree • Theorem: If T is a calibrated clique tree where for each Ci we have i[Ci]=P(Ci) then P(U)=r • Proof: • Let r be some arbitrarily chosen root in the clique tree • Since clique tree is a tree we have Ind(Ci;X|Si,pr(i)) • Thus, • We can now write • Since each clique and each sepset appear exactly once:

  27. Message Passing: Belief Propagation • Recall the clique tree calibration algorithm • Upon calibration the final potential at i is: • A message from i to j sums out the non-sepset variables from the product of initial potential and all other messages • Can also be viewed as multiplying all messages and dividing by the message from j to i

  28. Message Passing: Belief Propagation • Root: C2 • C1 to C2 Message: • C2 to C1 Message: • Alternatively compute • And then: •  Thus, the two approaches are equivalent Bayesian network Clique tree X2 X3 X1 X2 X3 X4 X1,X2 X2,X3 X3,X4

  29. Message Passing: Belief Propagation • Based on the observation above, belief propagation • Different message passing scheme • Each clique Ci maintains its fully updated beliefs i • product of initial messages and messages from neighbors • Store at each sepset Si,j the previous message i,j passed regardless of the direction • When passing a message, divide by previous i,j • Claim: message passing is correct regardless of the clique that sent the last message • This is called belief update or belief propagation

  30. Message Passing: Belief Propagation • Initialize the clique tree • For each clique Ci set • For each edge Ci—Cj set • While unset cliques exist • Select Ci—Cj • Send message from Ci to Cj • Marginalize the clique over the sepset • Update the belief at Cj • Update the sepset at Ci–Cj

  31. Clique Tree Invariant • Belief propagation can be viewed as reparameterizing the joint distribution • Upon calibration we showed • Initially this invariant holds since • At each update step invariant is also maintained • Message only changes i and i,j so most terms remain unchanged • We need to show • But this is exactly the message passing step •  Belief propagation reparameterizes P at each step

  32. Answering Queries • Posterior distribution queries on variable X • Sum out irrelevant variables from any clique containing X • Posterior distribution queries on family X,Pa(X) • Sum out irrelevant variables from clique containing X,Pa(X) • Introducing evidence Z=z • Compute posterior of X where X appears in clique with Z • Since clique tree is calibrated, multiply clique that contains X and Z with indicator function I(Z=z) and sum out irrelevant variables • Compute posterior of X if X does not share a clique with Z • Introduce indicator function I(Z=z) into some clique containing Z and propagate messages along path to clique containing X • Sum out irrelevant factors from clique containing X

  33. Constructing Clique Trees • Using variable elimination • Create a cluster Ci for each factor used during a VE run • Create an edge between Ci and Cj when a factor generated by Ci is used directly by Cj (or vice versa) •  We showed that cluster graph is a tree satisfying the running intersection property and thus it is a legal clique tree

  34. Constructing Clique Trees • Goal: construct a tree that is family preserving and obeys the running intersection property • Triangulate the graph to construct a chordal graph H • NP-hard to find triangulation where the largest clique in the resulting chordal graph has minimum size • Find cliques in H and make each a node in the graph • Finding maximal cliques is NP-hard • Can start with families and grow greedily • Construct a tree over the clique nodes • Use maximum spanning tree on an undirected graph whose nodes are maximal cliques and edge weight is |CiCj| • Can show that resulting graph obeys running intersection

  35. C C MoralizedGraph One possible triangulation I I D D G S G S L L J J H H Cluster graph with edge weights C,D G,I,D 1 1 1 2 2 2 C,D G,I,D G,S,I G,S,L L,S,J G,S,I G,S,L L,S,J 1 1 1 G,H G,H Example C I D G S L J H

More Related