Understanding Bucket Elimination in Statistical Methods for AI/ML

Statistical Methods in AI/ML Bucket elimination VibhavGogate

Bucket Elimination: Initialization (A,C) (C,E) A E D F B C A C E (C,D) (E,F) (A,B) B D F (B,D) (D,F) • You put each function in exactly one bucket • How? • Along the order, find the first bucket such that one of the variable’s in the function’s scope is the bucket variable

Bucket elimination: Processing Buckets A E D F B C ψ(B,C) A C • Process in order • Multiply all the functions in the bucket • Sum-out the bucket variable • Put the new function in one of the buckets obeying the initialization constraint E ψ(C,F) (E,F) (A,B) (A,C) (C,E) B D F (D,F) ψ(B,C,F) (C,D) (B,D) ψ2(B,C) ψ(C) Z

Bucket elimination: Why it works? A E D F B C A C E (E,F) (A,B) (A,C) (C,E) B D F (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Why it works? A E D F B C (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) and so on. ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Complexity A E D F B C exp(3) exp(3) exp(4) exp(3) exp(2) exp(1) ≈6exp(3) Complexity: O(nexp(w)) w: scope of the largest function generated n:#variables (E,F) (A,B) (A,C) (C,E) (D,F) (C,D) (B,D) ψ(B,C,F) ψ(C,F) ψ2(B,C) ψ(B,C) Z ψ(C)

Bucket elimination: Determining complexity graphically A • Schematic operation on a graph • Process nodes in order • Connect all children of a node to each other E A C E D B D F F B C

Bucket elimination: Complexity A • Complexity of processing a bucket “i” • exp(childreni) • Complexity of bucket elimination • nexp(max(childreni)) E D F B C

Treewidth and Tree Decompositions • Running schematic bucket elimination yields a chordal graph • Each cycle of length > 3 has a chord (an edge connecting two nodes that are not adjacent in the cycle) • Every chordal graph can be represented using a tree decomposition

Tree Decomposition of Chordal graphs A ABC E EFC BC FC D DBCF FBC F FBC BC B BC C C C

Tree Decomposition and Treewidth: Definition • Given a network and its interaction graph • Tree Decomposition is a set of subset of variables connected by a tree such that: • Each variable is present in at least one subset • Each edge is present in at least one subset • The set of subsets containing a variable “X” form a connected sub-tree • Running intersection property • Width of a tree decomposition: Cardinality of the maximum subset minus 1 • Treewidth: minimum width out of all possible tree decompositions

Bucket elimination: Complexity • Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph • Thus, we have a graph-based algorithm for determining the complexity of bucket elimination. • If w is small, we can solve the problem efficiently!

Generating Tree Decompositions • Computing treewidth is NP-hard • Branch and Bound algorithm (Gogate&Dechter, 2004) • Best-first search algorithm • (Dow and Korf, 2009) • Heuristics in practice • min-fill heuristic • min-degree heuristic

Min-degree and min-fill • min-degree • At each point, select a variable with minimum degree (ties broken arbitrarily) • Connect the children of the variable to each other • min-fill • At each point, select a variable that adds the minimum number of edges to the current graph • Connect the children of the selected variable to each other

Bucket Elimination: Implementation • Two basic operations: Sum-out and Product • Naïve implementation of these two operations will make your algorithm very slow • Factors: Use Arrays instead of Hashmaps! • Fast member functions for the following • Variable Assignment to Entry in the array • Entry in the array to Assignment

Computing all Marginals • Bucket elimination computes • P(e) or Z • P(Xi|e) where “Xi” is the last variable eliminated • To compute all marginals P(Xi|e) for all variables Xi • Run bucket elimination “n” times • Efficient algorithm • Junction tree algorithm or bucket tree propagation • Requires only two passes to compute all marginals

Junction tree algorithm:An exact message passing algorithm • Construct a tree decomposition T • Initialize the tree decomposition as in bucket elimination • Select an arbitrary node of T as root • Pass messages from leaves to root (upward pass) • Pass messages from root to leaves (downward pass)

Message passing Equations • Multiply all received messages except from R • Multiply all functions • Sum-out all variables except the separator S R

Computing all marginals S P(S)

Message passing Equations (A,B) (A,C) ABC • Select “EFC” as root • Pass messages from leaves to root • Pass messages from root to leaves (E,F) (C,E) EFC (C,D) (D,F) FC DBCF (B,D) FBC FBC BC BC C C

Architectures • Shenoy-Shafer architecture • Hugin architecture • Associate one function with each cluster • Requires Division • Smaller time complexity • Higher space complexity

Understanding Bucket Elimination in Statistical Methods for AI/ML

Understanding Bucket Elimination in Statistical Methods for AI/ML

Presentation Transcript

Introduction to Quality Assurance and Quality Control

Survey of Statistical Methods

Statistical methods for mapping imprinted QTL

STAT 3120 Statistical Methods I

4-1 Statistical Inference

Statistical Weather Forecasting

Corpora and Statistical Methods

Corpora and Statistical Methods – Lecture 7

Corpora and Statistical Methods

Corpora and Statistical Methods

Corpora and Statistical Methods

Corpora and Statistical Methods

Stat E-150 Statistical Methods

Statistical Methods

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Statistical Inference

Statistical Methods and Linguistics - Steven Abney

Statistical Methods

Discussion of Statistical Methods, Tools, and Simulations

Advanced Statistical Methods: Beyond Linear Regression

Sea Ice

Sea Ice