Variational Inference and Message Passing: Robotics & Vision Example
550 likes | 638 Vues
Explore Probabilistic models, Bayesian inference, Variational Inference, Message Passing with an example in Robotics and Vision. Learn about optimizing Bayesian networks using graphical models.
Variational Inference and Message Passing: Robotics & Vision Example
E N D
Presentation Transcript
Variational Inference and Variational Message Passing John WinnMicrosoft Research, Cambridge 12th November 2004 Robotics Research Group, University of Oxford
Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example
Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example
Object class C P(C) Surface colour Lighting colour L S P(L) P(S|C) Image colour I P(I|L,S) Bayesian networks • Directed graph • Nodes represent variables • Links show dependencies • Conditional distributions at each node • Defines a joint distribution: P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)
Bayesian inference Object class C Observed variables V and hidden variables H. Hidden Surface colour Lighting colour Hidden variables includeparameters and latent variables. L S Learning/inference involves finding: Image colour I P(H1, H2…| V) Observed
Bayesian inference vs. ML/MAP • Consider learning one parameter θ • How should we represent this posterior distribution?
Bayesian inference vs. ML/MAP • Consider learning one parameter θ Maximum of P(V| θ) P(θ) P(V| θ) P(θ) θ θMAP
High probability mass Bayesian inference vs. ML/MAP • Consider learning one parameter θ High probability density P(V| θ) P(θ) θ θMAP
Bayesian inference vs. ML/MAP • Consider learning one parameter θ P(V| θ) P(θ) θ θML Samples
Bayesian inference vs. ML/MAP • Consider learning one parameter θ P(V| θ) P(θ) θ Variational approximation θML
Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example
Variational Inference (in three easy steps…) • Choose a family of variational distributions Q(H). • Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|V) and Q(H). • Find Q which minimises divergence.
Q Minimising KL(P||Q) Q P KL Divergence Exclusive Minimising KL(Q||P) P Inclusive
Minimising the KL divergence • For arbitrary Q(H) fixed maximise minimise where • We choose a family of Q distributions where L(Q) is tractable to compute.
Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)
Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)
Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)
Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)
Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)
Optimal solution for one factor given by Factorised Approximation • Assume Q factorises No further assumptions are required!
Example: Univariate Gaussian • Likelihood function • Conjugate prior • Factorized variational distribution
Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example
Variational Message Passing • VMP makes it easier and quicker to apply factorised variational inference. • VMP carries out variational inference using local computations and message passing on the graphical model. • Modular algorithm allows modifying, extending or combining models.
Local Updates For factorised Q, update for each factor depends only on the Markov blanket: Updates can be carried out locally at each node.
For example, the Gaussian distribution T mg é ù é ù g X m g = + - gm + 2 ln P ( X | , ) ln 0 1 1 ê ú ê ú 2 2 - g p 2 / 2 X 2 ë û ë û VMP I: The Exponential Family • Conditional distributions expressed in exponential family form. = + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) ‘natural’ parameter vector sufficient statistics vector
= + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) = + + T ln P ( Z | X , Y ) φ ( Y , Z ) u ( X ) g ' ( X ) f ' ( Y , Z ) VMP II: Conjugacy • Parents and children are chosen to be conjugate i.e. same functional form X Y same Z • Examples: • Gaussian for the mean of a Gaussian • Gamma for the precision of a Gaussian • Dirichlet for the parameters of a discrete distribution
= + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) • Parent to child (X→Z) = + + T ln P ( Z | X , Y ) φ ( Y , Z ) u ( X ) g ' ( X ) f ' ( Y , Z ) • Child to parent (Z→X) VMP III: Messages • Conditionals • Messages X Y Z
VMP IV: Update • Optimal Q(X) has same form as P(X|θ) but with updated parameter vector θ* Computed from messages from parents • These messages can also be used to calculate the bound on the evidence L(Q) – see Winn & Bishop, 2004.
VMP Example • Learning parameters of a Gaussian from N data points. γ μ mean precision (inverse variance) x data N
VMP Example Message from γ to all x. γ μ need initialQ(γ) x N
VMP Example Messages from each xnto μ. γ μ x N Update Q(μ) parameter vector
VMP Example Message from updated μ to all x. γ μ x N
VMP Example Messages from each xnto γ. γ μ x N Update Q(γ) parameter vector
Features of VMP • Graph does not need to be a tree – it can contain loops (but not cycles). • Flexible message passing schedule – factors can be updated in any order. • Distributions can be discrete or continuous, multivariate, truncated (e.g. rectified Gaussian). • Can have deterministic relationships (A=B+C). • Allows for point estimates e.g. of hyper-parameters
VMP Software: VIBES Free download from vibes.sourceforge.net
Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example
Flexible sprite model Proposed by Jojic & Frey (2001) Set of images e.g. frames from a video x N
Flexible sprite model f π Sprite appearance and shape x N
Flexible sprite model f π Sprite transform for this image (discretised) T m x Mask for this image N
Flexible sprite model b f π Background T m Noise β x N
VMP π f b T m β x N Winn & Blake (NIPS 2004)
Results of VMP on hand video Original video Learned appearance and mask Learned transforms (i.e. motion)
Conclusions • Variational Message Passing allows approximate Bayesian inference for a wide range of models. • VMP simplifies the construction, testing, extension and comparison of models. • You can try VMP for yourself vibes.sourceforge.net