1 / 30

Introduction to Graphical Models

Introduction to Graphical Models. Brookes Vision Lab Reading Group. Graphical Models. To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks. Overview. Representation

valin
Télécharger la présentation

Introduction to Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Graphical Models Brookes Vision Lab Reading Group

  2. Graphical Models • To build a complex system using simpler parts. • System should be consistent • Parts are combined using probability • Undirected – Markov random fields • Directed – Bayesian Networks

  3. Overview • Representation • Inference • Linear Gaussian Models • Approximate inference • Learning

  4. Representation Causality : Sprinkler “causes” wet grass

  5. Conditional Independence • Independent of ancestors given parents • P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R) • = P(C) P(S|C) P(R|C) P(W|S,R) • Space required for n binary nodes • O(2n) without factorization • O(n2k) with factorization, k = maximum fan-in

  6. Inference • Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430 • Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

  7. Explaining Away • S and R “compete” to explain W=1 • S and R are conditionally dependent • Pr(S=1|R=1,W=1) = 0.1945

  8. Inference where where

  9. Inference • Variable elimination • Choosing optimal ordering – NP hard • Greedy methods work well • Computing several marginals • Dynamic programming avoids redundant computation • Sound familiar ??

  10. Bayes Balls for Conditional Independence

  11. A Unifying (Re)View FA SPCA PCA LDS Continuous-State LGM Linear Gaussian Model (LGM) Basic Model Mixture of Gaussians VQ HMM Discrete-State LGM

  12. Basic Model • State of a system is a k-vector x (unobserved) • Output of a system is a p-vector y (observed) • Often k << p • Basic model • xt+1 = A xt + w • yt = C xt + v • A is the k x k transition matrix • C is a p x k observation matrix • w = N(0, Q) • v = N(0, R) • Noise processes are essential Zero mean w.l.o.g

  13. Degeneracy in Basic Model • Structure in Q can be moved to A and C • W.l.o.g. Q = I • R cannot be restricted as yt are observed • Components of x can be reordered arbitrarily. • Ordering is based on norms of columns of C. • x1 = N(µ1, Q1) • A and C are assumed to have rank k. • Q, R, Q1 are assumed to be full rank.

  14. Probability Computation • P( xt+1 | xt ) = N(A xt, Q ; xt+1) • P( yt | xt ) = N( C xt, R; yt) • P({x1,..,xT,{y1,..,yT}) = P(x1) П P(xt+1|xtП P(yt|xt) • Negative log probability

  15. Inference • Given model parameters {A, C, Q, R, µ1, Q1} • Given observations y • What can be infered about hidden states x ? • Total likelihood • Filtering : P (x(t) | {y(1), ... , y(t)}) • Smoothing: P (x(t) | {y(1), ... , y(T)}) • Partial smoothing: P (x(t) | {y(1), ... , y(t+t')}) • Partial prediction: P (x(t) | {y(1), ... , y(t-t')}) • Intermediate values of recursive methods for computing total likelihood.

  16. Learning • Unknown parameters {A, C, Q, R, µ1, Q1} • Given observations y • Log-likelihood F(Q,Ө) – free energy

  17. EM algorithm • Alternate between maximizing F(Q,Ө) w.r.t. Q and Ө. • F = L at the beginning of M-step • E-step does not change Ө • Therefore, likelihood does not decrease.

  18. Continuous-State LGM Continuous-State LGM Static Data Modeling Time-series Modeling • No temporal dependence • Factor analysis • SPCA • PCA • Time ordering of data crucial • LDS (Kalman filter models)

  19. Static Data Modelling • A = 0 • x = w • y =C x + v • x1 = N(0,Q) • y = N(0, CQC'+R) • Degeneracy in model • Learning : EM • R restricted • Inference

  20. Factor Analysis • Restrict R to be diagonal. • Q = I • x – factors • C – factor loading matrix • R – uniqueness • Learning – EM , quasi-Newton optimization • Inference

  21. SPCA • R = єI • є – global noise level • Columns of C span the principal subspace. • Learning – EM algorithm • Inference

  22. PCA • R = lim є->0 єI • Learning • Diagonalize sample covariance of data • Leading k eigenvalues and eigenvectors define C • EM determines leading eigenvectors without diagonalization • Inference • Noise becomes infinitesimal • Posterior collapses to a single point

  23. Linear Dynamical Systems • Inference – Kalman filter • Smoothing – RTS recursions • Learning – EM algorithm • C known – Shumway and Stoffer, 1982 • All unknown – Ghahramani and Hinton, 1995

  24. Discrete-State LGM • xt+1 = WTA[A xt + w] • yt = C xt + v • x1 = WTA[N(µ1,Q1)]

  25. Discrete-State LGM Discrete-state LGM Static Data Modeling Time-series Modeling • Mixture of Gaussians • VQ • HMM

  26. Static Data Modelling • A = 0 • x = WTA[w] • w = N(µ,Q) • Y = C x + v • лj = P(x = ej) • Nonzero µ for nonuniform лj • y = N(Cj, R) • Cj – jth column of C

  27. Mixture of Gaussians • Mixing coefficients of cluster лj • Mean – columns Cj • Variance – R • Learning: EM (corresponds to ML competitive learning) • Inference

  28. Vector Quantization • Observation noise becomes infinitesimal • Inference problem solved by 1NN rule • Euclidean distance for diagonal R • Mahalanobis distance for unscaled R • Posterior collapses to closest cluster • Learning with EM = batch version of k-means

  29. Time-series modelling

  30. HMM • Transition matrix T • Ti,j = P(xt+1 = ej | xt = ei) • For every T, there exist A and Q • Filtering : forward recursions • Smoothing: forward-backward algorithm • Learning: EM (called Baum-Welsh reestimation) • MAP state sequences - Viterbi

More Related