1 / 24

Statistical Methods in AI/ML

Statistical Methods in AI/ML. Bucket elimination Vibhav Gogate. Bucket Elimination: Initialization. (A,C). (C,E). A E D F B C. A. C. E. (C,D). (E,F). (A,B). B. D. F. (B,D). (D,F). You put each function i n exactly one bucket How?

sani
Télécharger la présentation

Statistical Methods in AI/ML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methods in AI/ML Bucket elimination VibhavGogate

  2. Bucket Elimination: Initialization (A,C) (C,E) A E D F B C A C E (C,D) (E,F) (A,B) B D F (B,D) (D,F) • You put each function in exactly one bucket • How? • Along the order, find the first bucket such that one of the variable’s in the function’s scope is the bucket variable

  3. Bucket elimination: Processing Buckets A E D F B C ψ(B,C) A C • Process in order • Multiply all the functions in the bucket • Sum-out the bucket variable • Put the new function in one of the buckets obeying the initialization constraint E ψ(C,F) (E,F) (A,B) (A,C) (C,E) B D F (D,F) ψ(B,C,F) (C,D) (B,D) ψ2(B,C) ψ(C) Z

  4. Bucket elimination: Why it works? A E D F B C A C E (E,F) (A,B) (A,C) (C,E) B D F (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

  5. Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

  6. Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

  7. Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

  8. Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) and so on. ψ2(B,C) ψ(B,C) Z ψ(C)

  9. Bucket elimination: Complexity A E D F B C exp(3) exp(3) exp(4) exp(3) exp(2) exp(1) ≈6exp(3) Complexity: O(nexp(w)) w: scope of the largest function generated n:#variables (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

  10. Bucket elimination: Determining complexity graphically A • Schematic operation on a graph • Process nodes in order • Connect all children of a node to each other E A C E D B D F F B C

  11. Bucket elimination: Complexity A • Complexity of processing a bucket “i” • exp(childreni) • Complexity of bucket elimination • nexp(max(childreni)) E D F B C

  12. Treewidth and Tree Decompositions • Running schematic bucket elimination yields a chordal graph • Each cycle of length > 3 has a chord (an edge connecting two nodes that are not adjacent in the cycle) • Every chordal graph can be represented using a tree decomposition

  13. Tree Decomposition of Chordal graphs A ABC E EFC BC FC D DBCF FBC F FBC BC B BC C C C

  14. Tree Decomposition and Treewidth: Definition • Given a network and its interaction graph • Tree Decomposition is a set of subset of variables connected by a tree such that: • Each variable is present in at least one subset • Each edge is present in at least one subset • The set of subsets containing a variable “X” form a connected sub-tree • Running intersection property • Width of a tree decomposition: Cardinality of the maximum subset minus 1 • Treewidth: minimum width out of all possible tree decompositions

  15. Bucket elimination: Complexity • Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph • Thus, we have a graph-based algorithm for determining the complexity of bucket elimination. • If w is small, we can solve the problem efficiently!

  16. Generating Tree Decompositions • Computing treewidth is NP-hard • Branch and Bound algorithm (Gogate&Dechter, 2004) • Best-first search algorithm • (Dow and Korf, 2009) • Heuristics in practice • min-fill heuristic • min-degree heuristic

  17. Min-degree and min-fill • min-degree • At each point, select a variable with minimum degree (ties broken arbitrarily) • Connect the children of the variable to each other • min-fill • At each point, select a variable that adds the minimum number of edges to the current graph • Connect the children of the selected variable to each other

  18. Bucket Elimination: Implementation • Two basic operations: Sum-out and Product • Naïve implementation of these two operations will make your algorithm very slow • Factors: Use Arrays instead of Hashmaps! • Fast member functions for the following • Variable Assignment to Entry in the array • Entry in the array to Assignment

  19. Computing all Marginals • Bucket elimination computes • P(e) or Z • P(Xi|e) where “Xi” is the last variable eliminated • To compute all marginals P(Xi|e) for all variables Xi • Run bucket elimination “n” times • Efficient algorithm • Junction tree algorithm or bucket tree propagation • Requires only two passes to compute all marginals

  20. Junction tree algorithm:An exact message passing algorithm • Construct a tree decomposition T • Initialize the tree decomposition as in bucket elimination • Select an arbitrary node of T as root • Pass messages from leaves to root (upward pass) • Pass messages from root to leaves (downward pass)

  21. Message passing Equations • Multiply all received messages except from R • Multiply all functions • Sum-out all variables except the separator S R

  22. Computing all marginals S P(S)

  23. Message passing Equations (A,B) (A,C) ABC • Select “EFC” as root • Pass messages from leaves to root • Pass messages from root to leaves (E,F) (C,E) EFC (C,D) (D,F) FC DBCF (B,D) FBC FBC BC BC C C

  24. Architectures • Shenoy-Shafer architecture • Hugin architecture • Associate one function with each cluster • Requires Division • Smaller time complexity • Higher space complexity

More Related