Download
extending expectation propagation for graphical models n.
Skip this Video
Loading SlideShow in 5 Seconds..
Extending Expectation Propagation for Graphical Models PowerPoint Presentation
Download Presentation
Extending Expectation Propagation for Graphical Models

Extending Expectation Propagation for Graphical Models

155 Views Download Presentation
Download Presentation

Extending Expectation Propagation for Graphical Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka

  2. Motivation • Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics. • Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. • Need a method that better balances the trade-off between accuracy and efficiency.

  3. What we want Motivation Current Techniques Error Computational Time

  4. Outline • Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work

  5. Outline • Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work

  6. x1 x2 x1 x2 y1 y2 y1 y2 Graphical Models

  7. Inference on Graphical Models • Bayesian inference techniques: • Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm • Monte Carlo: Particle filter/smoothers, MCMC • Loopy BP: typically efficient, but not accurate on general loopy graphs • Monte Carlo: accurate, but often not efficient

  8. Expectation Propagation in a Nutshell • Approximate a probability distribution by simpler parametric terms: For directed graphs: For undirected graphs: • Each approximation term lives in an exponential family (e.g. Gaussian)

  9. EP in a Nutshell • The approximate term minimizes the following KL divergence by moment matching: Where the leave-one-out approximation is

  10. Limitations of Plain EP • Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence. • Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. • Tree-structured q(x):

  11. Three Extensions 1. Instead of choosing the approximate term to minimize the following KL divergence: use other criteria. 2. Use numerical approximation to compute moments: Quadrature or Monte Carlo. 3. Allow the tree-structured q(x) to be non-coherentduring the iterations. It only needs to be coherent in the end.

  12. Efficiency vs. Accuracy Loopy BP (Factorized EP) Error Extended EP ? Monte Carlo Computational Time

  13. Outline • Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work

  14. Object Tracking Guess the position of an object given noisy observations Object

  15. x1 x2 xT y1 y2 yT Bayesian Network e.g. (random walk) want distribution of x’s given y’s

  16. Approximation Factorized and Gaussian in x

  17. xt yt Message Interpretation = (forward msg)(observation msg)(backward msg) Forward Message Backward Message Observation Message

  18. EP on Dynamic Systems • Filtering: t = 1, …, T • Incorporate forward message • Initialize observation message • Smoothing: t = T, …, 1 • Incorporate the backward message • Compute the leave-one-out approximation by dividing out the old observation messages • Re-approximate the new observation messages • Re-filtering: t = 1, …, T • Incorporate forward and observation messages

  19. Extension of EP • Instead of matching moments, use any method for approximate filtering. • Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters Turn any filtering method into a smoothing method! • All methods can be interpreted as finding linear/Gaussian approximations to original terms.

  20. Example: Poisson Tracking • is an integer valued Poisson variate with mean

  21. Poisson Tracking Model

  22. Extension of EP: Approximate Observation Message • is not Gaussian • Moments of x not analytic • Two approaches: • Gauss-Hermite quadrature for moments • Statistical linearization instead of moment-matching (Turn unscented Kalman filters into a smoothing method) • Both work well

  23. Approximate vs. Exact Posterior p(xT|y1:T) xT

  24. Extended EP vs. Monte Carlo: Accuracy Mean Variance

  25. Accuracy/Efficiency Tradeoff

  26. EP for Digital Wireless Communication • Signal detection problem • Transmitted signal st = • vary to encode each symbol • Complex representation: Im Re

  27. Binary Symbols, Gaussian Noise • Symbols are 1 and –1 (in complex plane) • Received signal yt = • Optimal detection is easy

  28. Fading Channel • Channel systematically changes amplitude and phase: • changes over time

  29. Benchmark: Differential Detection • Classical technique • Use previous observation to estimate state • Binary symbols only

  30. s2 sT s1 y1 yT y2 x1 xT x2 Bayesian network for Signal Detection

  31. Extended-EP Joint Signal Detector and Channel Estimation • Turn mixture of Kalman filters into a smoothing method • Smoothing over the last observations • Observations before act as prior for the current estimation

  32. Computational Complexity • Expectation propagation O(nLd2) • Stochastic mixture of Kalman filters O(LMd2) • Rao-blackwised particle smoothersO(LMNd2) • n: Number of EP iterations (Typically, 4 or 5) • d: Dimension of the parameter vector • L: Smooth window length • M: Number of samples in filtering (Often larger than 500) • N: Number of samples in smoothing (Larger than 50) • EP is about 5,000 times faster than Rao-blackwised particle smoothers.

  33. Experimental Results (Chen, Wang, Liu 2000) Signal-Noise-Ratio Signal-Noise-Ratio EP outperforms particle smoothers in efficiency with comparable accuracy.

  34. e2 eT e1 y1 yT y2 x1 xT x2 Bayesian Networks for Adaptive Decoding The information bits et are coded by a convolutional error-correcting encoder.

  35. EP Outperforms Viterbi Decoding Signal-Noise-Ratio

  36. Outline • Background on expectation propagation (EP) • Extending EP on Bayesian networks for dynamic systems • Poisson tracking • Signal detection for wireless communications • Tree-structured EP on loopy graphs • Conclusions and future work

  37. X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 Inference on Loopy Graphs Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.

  38. 4-node Loopy Graph Joint distribution is product of pairwise potentials for all edges: Want to approximate by a simpler distribution

  39. BP vs. TreeEP TreeEP BP

  40. Junction Tree Representation p(x) q(x) Junction tree p(x) q(x) Junction tree

  41. Two Kinds of Edges • On-tree edges, e.g., (x1,x4):exactly incorporated into the junction tree • Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure

  42. KL Minimization • KL minimization moment matching • Match single and pairwise marginals of • Reduces to exact inference on single loops • Use cutset conditioning and

  43. x5 x7 x5 x7 x1 x2 x1 x2 x1 x3 x1 x4 x3 x6 x3 x5 x1 x3 x3 x5 x3 x6 x1 x4 x1 x2 x1 x3 x1 x4 x3 x5 x3 x4 x5 x7 x3 x6 x1 x2 x1 x2 x1 x3 x1 x4 x1 x3 x1 x4 x3 x5 x3 x5 x5 x7 x3 x6 x5 x7 x3 x6 x6 x7 Matching Marginals on Graph (1) Incorporate edge (x3 x4) (2) Incorporate edge (x6 x7)

  44. Drawbacks of Global Propagation • Update all the cliques even when only incorporating one off-tree edge • Computationally expensive • Store each off-tree data message as a whole tree • Require large memory size

  45. Solution: Local Propagation • Allow q(x) be non-coherentduring the iterations. It only needs to be coherent in the end. • Exploit the junction tree representation: only locallypropagate information within the minimal loop (subtree) that is directly connected to the off-tree edge. • Reduce computational complexity • Save memory

  46. x1 x2 x5 x7 x1 x2 x1 x2 x5 x7 x5 x7 x1 x2 x1 x3 x1 x4 x3 x5 x1 x3 x3 x6 x3 x6 x3 x5 x1 x3 x1 x4 x1 x4 x3 x5 x3 x6 x1 x3 x1 x4 x3 x4 x3 x5 x3 x5 x5 x7 x3 x6 x6 x7 (1) Incorporate edge(x3 x4) (2) Propagate evidence On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP (3) Incorporate edge (x6 x7)

  47. New Interpretation of TreeEP • Marry EP with Junction algorithm • Can perform efficiently over hypertrees and hypernodes

  48. 4-node Graph • TreeEP = the proposed method • GBP = generalized belief propagation on triangles • TreeVB = variational tree • BP = loopy belief propagation = Factorized EP • MF = mean-field

  49. Fully-connected graphs • Results are averaged over 10 graphs with randomly generated potentials • TreeEP performs the same or better than all other methods in both accuracy and efficiency!

  50. 8x8 grids, 10 trials