Create Presentation
Download Presentation

Download Presentation
## Extending Expectation Propagation for Graphical Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Extending Expectation Propagation for Graphical Models**Yuan (Alan) Qi Joint work with Tom Minka**Motivation**• Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics. • Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. • Need a method that better balances the trade-off between accuracy and efficiency.**What we want**Motivation Current Techniques Error Computational Time**Outline**• Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work**Outline**• Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work**x1**x2 x1 x2 y1 y2 y1 y2 Graphical Models**Inference on Graphical Models**• Bayesian inference techniques: • Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm • Monte Carlo: Particle filter/smoothers, MCMC • Loopy BP: typically efficient, but not accurate on general loopy graphs • Monte Carlo: accurate, but often not efficient**Expectation Propagation in a Nutshell**• Approximate a probability distribution by simpler parametric terms: For directed graphs: For undirected graphs: • Each approximation term lives in an exponential family (e.g. Gaussian)**EP in a Nutshell**• The approximate term minimizes the following KL divergence by moment matching: Where the leave-one-out approximation is**Limitations of Plain EP**• Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence. • Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. • Tree-structured q(x):**Three Extensions**1. Instead of choosing the approximate term to minimize the following KL divergence: use other criteria. 2. Use numerical approximation to compute moments: Quadrature or Monte Carlo. 3. Allow the tree-structured q(x) to be non-coherentduring the iterations. It only needs to be coherent in the end.**Efficiency vs. Accuracy**Loopy BP (Factorized EP) Error Extended EP ? Monte Carlo Computational Time**Outline**• Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work**Object Tracking**Guess the position of an object given noisy observations Object**x1**x2 xT y1 y2 yT Bayesian Network e.g. (random walk) want distribution of x’s given y’s**Approximation**Factorized and Gaussian in x**xt**yt Message Interpretation = (forward msg)(observation msg)(backward msg) Forward Message Backward Message Observation Message**EP on Dynamic Systems**• Filtering: t = 1, …, T • Incorporate forward message • Initialize observation message • Smoothing: t = T, …, 1 • Incorporate the backward message • Compute the leave-one-out approximation by dividing out the old observation messages • Re-approximate the new observation messages • Re-filtering: t = 1, …, T • Incorporate forward and observation messages**Extension of EP**• Instead of matching moments, use any method for approximate filtering. • Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters Turn any filtering method into a smoothing method! • All methods can be interpreted as finding linear/Gaussian approximations to original terms.**Example: Poisson Tracking**• is an integer valued Poisson variate with mean**Extension of EP: Approximate Observation Message**• is not Gaussian • Moments of x not analytic • Two approaches: • Gauss-Hermite quadrature for moments • Statistical linearization instead of moment-matching (Turn unscented Kalman filters into a smoothing method) • Both work well**Approximate vs. Exact Posterior**p(xT|y1:T) xT**Extended EP vs. Monte Carlo: Accuracy**Mean Variance**EP for Digital Wireless Communication**• Signal detection problem • Transmitted signal st = • vary to encode each symbol • Complex representation: Im Re**Binary Symbols, Gaussian Noise**• Symbols are 1 and –1 (in complex plane) • Received signal yt = • Optimal detection is easy**Fading Channel**• Channel systematically changes amplitude and phase: • changes over time**Benchmark: Differential Detection**• Classical technique • Use previous observation to estimate state • Binary symbols only**s2**sT s1 y1 yT y2 x1 xT x2 Bayesian network for Signal Detection**Extended-EP Joint Signal Detector and Channel Estimation**• Turn mixture of Kalman filters into a smoothing method • Smoothing over the last observations • Observations before act as prior for the current estimation**Computational Complexity**• Expectation propagation O(nLd2) • Stochastic mixture of Kalman filters O(LMd2) • Rao-blackwised particle smoothersO(LMNd2) • n: Number of EP iterations (Typically, 4 or 5) • d: Dimension of the parameter vector • L: Smooth window length • M: Number of samples in filtering (Often larger than 500) • N: Number of samples in smoothing (Larger than 50) • EP is about 5,000 times faster than Rao-blackwised particle smoothers.**Experimental Results**(Chen, Wang, Liu 2000) Signal-Noise-Ratio Signal-Noise-Ratio EP outperforms particle smoothers in efficiency with comparable accuracy.**e2**eT e1 y1 yT y2 x1 xT x2 Bayesian Networks for Adaptive Decoding The information bits et are coded by a convolutional error-correcting encoder.**EP Outperforms Viterbi Decoding**Signal-Noise-Ratio**Outline**• Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work**X1**X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 Inference on Loopy Graphs Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.**4-node Loopy Graph**Joint distribution is product of pairwise potentials for all edges: Want to approximate by a simpler distribution**BP vs. TreeEP**TreeEP BP**Junction Tree Representation**p(x) q(x) Junction tree p(x) q(x) Junction tree**Two Kinds of Edges**• On-tree edges, e.g., (x1,x4):exactly incorporated into the junction tree • Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure**KL Minimization**• KL minimization moment matching • Match single and pairwise marginals of • Reduces to exact inference on single loops • Use cutset conditioning and**x5 x7**x5 x7 x1 x2 x1 x2 x1 x3 x1 x4 x3 x6 x3 x5 x1 x3 x3 x5 x3 x6 x1 x4 x1 x2 x1 x3 x1 x4 x3 x5 x3 x4 x5 x7 x3 x6 x1 x2 x1 x2 x1 x3 x1 x4 x1 x3 x1 x4 x3 x5 x3 x5 x5 x7 x3 x6 x5 x7 x3 x6 x6 x7 Matching Marginals on Graph (1) Incorporate edge (x3 x4) (2) Incorporate edge (x6 x7)**Drawbacks of Global Propagation**• Update all the cliques even when only incorporating one off-tree edge • Computationally expensive • Store each off-tree data message as a whole tree • Require large memory size**Solution: Local Propagation**• Allow q(x) be non-coherentduring the iterations. It only needs to be coherent in the end. • Exploit the junction tree representation: only locallypropagate information within the minimal loop (subtree) that is directly connected to the off-tree edge. • Reduce computational complexity • Save memory**x1 x2**x5 x7 x1 x2 x1 x2 x5 x7 x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x1 x3 x3 x6 x3 x6 x3 x5 x1 x3 x1 x4 x1 x4 x3 x5 x3 x6 x1 x3 x1 x4 x3 x4 x3 x5 x3 x5 x5 x7 x3 x6 x6 x7 (1) Incorporate edge(x3 x4) (2) Propagate evidence On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP (3) Incorporate edge (x6 x7)**New Interpretation of TreeEP**• Marry EP with Junction algorithm • Can perform efficiently over hypertrees and hypernodes**4-node Graph**• TreeEP = the proposed method • GBP = generalized belief propagation on triangles • TreeVB = variational tree • BP = loopy belief propagation = Factorized EP • MF = mean-field**Fully-connected graphs**• Results are averaged over 10 graphs with randomly generated potentials • TreeEP performs the same or better than all other methods in both accuracy and efficiency!