Variational Inference and Message Passing: Robotics & Vision Example

Variational Inference and Variational Message Passing John WinnMicrosoft Research, Cambridge 12th November 2004 Robotics Research Group, University of Oxford

Overview • Probabilistic models & Bayesian inference • Variational Inference • Variational Message Passing • Vision example

Object class C P(C) Surface colour Lighting colour L S P(L) P(S|C) Image colour I P(I|L,S) Bayesian networks • Directed graph • Nodes represent variables • Links show dependencies • Conditional distributions at each node • Defines a joint distribution: P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)

Bayesian inference Object class C Observed variables V and hidden variables H. Hidden Surface colour Lighting colour Hidden variables includeparameters and latent variables. L S Learning/inference involves finding: Image colour I P(H1, H2…| V) Observed

Bayesian inference vs. ML/MAP • Consider learning one parameter θ • How should we represent this posterior distribution?

Bayesian inference vs. ML/MAP • Consider learning one parameter θ Maximum of P(V| θ) P(θ) P(V| θ) P(θ) θ θMAP

High probability mass Bayesian inference vs. ML/MAP • Consider learning one parameter θ High probability density P(V| θ) P(θ) θ θMAP

Bayesian inference vs. ML/MAP • Consider learning one parameter θ P(V| θ) P(θ) θ θML Samples

Bayesian inference vs. ML/MAP • Consider learning one parameter θ P(V| θ) P(θ) θ Variational approximation θML

Variational Inference (in three easy steps…) • Choose a family of variational distributions Q(H). • Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|V) and Q(H). • Find Q which minimises divergence.

Q Minimising KL(P||Q) Q P KL Divergence Exclusive Minimising KL(Q||P) P Inclusive

Minimising the KL divergence • For arbitrary Q(H) fixed maximise minimise where • We choose a family of Q distributions where L(Q) is tractable to compute.

Minimising the KL divergence KL(Q || P) maximise ln P(V) fixed L(Q)

Optimal solution for one factor given by Factorised Approximation • Assume Q factorises No further assumptions are required!

Example: Univariate Gaussian • Likelihood function • Conjugate prior • Factorized variational distribution

Initial Configuration γ μ

After Updating Q(μ) γ μ

After Updating Q(γ) γ μ

Converged Solution γ μ

Lower Bound for GMM

Variational Equations for GMM

Variational Message Passing • VMP makes it easier and quicker to apply factorised variational inference. • VMP carries out variational inference using local computations and message passing on the graphical model. • Modular algorithm allows modifying, extending or combining models.

Local Updates For factorised Q, update for each factor depends only on the Markov blanket: Updates can be carried out locally at each node.

For example, the Gaussian distribution T mg é ù é ù g X m g = + - gm + 2 ln P ( X | , ) ln 0 1 1 ê ú ê ú 2 2 - g p 2 / 2 X 2 ë û ë û VMP I: The Exponential Family • Conditional distributions expressed in exponential family form. = + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) ‘natural’ parameter vector sufficient statistics vector

= + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) = + + T ln P ( Z | X , Y ) φ ( Y , Z ) u ( X ) g ' ( X ) f ' ( Y , Z ) VMP II: Conjugacy • Parents and children are chosen to be conjugate i.e. same functional form X Y same Z • Examples: • Gaussian for the mean of a Gaussian • Gamma for the precision of a Gaussian • Dirichlet for the parameters of a discrete distribution

= + + T ln P ( X | θ ) θ u ( X ) g ( θ ) f ( X ) • Parent to child (X→Z) = + + T ln P ( Z | X , Y ) φ ( Y , Z ) u ( X ) g ' ( X ) f ' ( Y , Z ) • Child to parent (Z→X) VMP III: Messages • Conditionals • Messages X Y Z

VMP IV: Update • Optimal Q(X) has same form as P(X|θ) but with updated parameter vector θ* Computed from messages from parents • These messages can also be used to calculate the bound on the evidence L(Q) – see Winn & Bishop, 2004.

VMP Example • Learning parameters of a Gaussian from N data points. γ μ mean precision (inverse variance) x data N

VMP Example Message from γ to all x. γ μ need initialQ(γ) x N

VMP Example Messages from each xnto μ. γ μ x N Update Q(μ) parameter vector

VMP Example Message from updated μ to all x. γ μ x N

VMP Example Messages from each xnto γ. γ μ x N Update Q(γ) parameter vector

Features of VMP • Graph does not need to be a tree – it can contain loops (but not cycles). • Flexible message passing schedule – factors can be updated in any order. • Distributions can be discrete or continuous, multivariate, truncated (e.g. rectified Gaussian). • Can have deterministic relationships (A=B+C). • Allows for point estimates e.g. of hyper-parameters

VMP Software: VIBES Free download from vibes.sourceforge.net

Flexible sprite model Proposed by Jojic & Frey (2001) Set of images e.g. frames from a video x N

Flexible sprite model f π Sprite appearance and shape x N

Flexible sprite model f π Sprite transform for this image (discretised) T m x Mask for this image N

Flexible sprite model b f π Background T m Noise β x N

VMP π f b T m β x N Winn & Blake (NIPS 2004)

Results of VMP on hand video Original video Learned appearance and mask Learned transforms (i.e. motion)

Conclusions • Variational Message Passing allows approximate Bayesian inference for a wide range of models. • VMP simplifies the construction, testing, extension and comparison of models. • You can try VMP for yourself vibes.sourceforge.net

That’s all folks!

Variational Inference and Message Passing: Robotics & Vision Example

Variational Inference and Message Passing: Robotics & Vision Example

Presentation Transcript

Variational Calculus

Variational Image Restoration

Variational Inference for Dirichlet Process Mixture

Variational Inference for the Indian Buffet Process

Variational Quality Control

VIBES Variational Inference Engine For Bayesian Networks

Particle-based Variational Inference for Continuous Systems

Variational Data Assimilation

Variational Bayes 101

Variational Tetrahedral Meshing

Quantifying Variational Solutions †

Variational Principles and Lagrange’s Equations

The Variational Principle

Variational Assimilation (VAR)

Variational Tetrahedral Meshing

Variational Methods

Variational Formulation

Variational Path Profiling

Discrete Variational Mechanics

Variational Bayesian Inference for fMRI time series

Variational and Weighted Residual Methods

Variational and Weighted Residual Methods