Probabilistic Inference Lecture 3

Probabilistic InferenceLecture 3 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Recap of lecture 1

Exponential Family P(v) = exp{-ΣαθαΦα(v) - A(θ)} Sufficient Statistics Log-Partition Function Parameters Random Variables V = {V1,V2,…,Vn} Random Variable Va takes a value or label va va L = {l1,l2,…,lh} Labeling V = v

Overcomplete Representation P(v) = exp{-ΣαθαΦα(v) - A(θ)} Sufficient Statistics Log-Partition Function Parameters There exists a non-zero c such that Σα cαΦα(v) = Constant

Pairwise MRF P(v) = exp{-ΣαθαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Sufficient Statistics Parameters for all Va V, li  L Ia;i(va) θa;i θab;ik for all (Va,Vb)  E, li, lk L Iab;ik(va,vb)

Pairwise MRF P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Πaψa(va) Π(a,b)ψab(va,vb) Probability P(v) = Z A(θ) : log Z ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik) Parameters θ are sometimes also referred to as potentials

Pairwise MRF P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a)

Pairwise MRF P(f) = exp{-Σaθa;f(a) -Σa,bθab;f(a)f(b)- A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σaθa;f(a) + Σa,bθab;f(a)f(b)

Pairwise MRF P(f) = exp{-Q(f)- A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σaθa;f(a) + Σa,bθab;f(a)f(b)

Inference maxv ( P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} ) Maximum a Posteriori (MAP) Estimation minf ( Q(f) = Σaθa;f(a) + Σa,bθab;f(a)f(b) ) Energy Minimization P(va = li) = ΣvP(v)δ(va = li) P(va = li, vb = lk) = ΣvP(v)δ(va = li)δ(vb= lk) Computing Marginals

Recap of lecture 2

Definitions Energy Minimization f* = arg min Q(f; ) Q(f; ) = ∑a a;f(a) + ∑(a,b) ab;f(a)f(b) Min-marginals s.t. f(a) = i qa;i= min Q(f; ) Reparameterization ’   Q(f; ’) = Q(f; ), for all f

+ Mab;k ’b;k= b;k + Mba;i ’a;i = a;i ’ab;ik = ab;ik - Mab;k - Mba;i Belief Propagation Pearl, 1988 General form of Reparameterization Reparameterization of (a,b) in Belief Propagation Mab;k = mini { a;i + ab;ik } Mba;i = 0

Belief Propagation on Trees Va Vb Vc Vg Vh Vd Ve Forward Pass: Leaf  Root Backward Pass: Root  Leaf All min-marginals are computed

Computational Complexity • Each constant takes O(|L|) • Number of constants - O(|E||L|) O(|E||L|2) • Memory required ? O(|E||L|)

Belief Propagation on Cycles b;1 a;1 b;0 a;0 Va Vb c;1 d;1 c;0 d;0 Vd Vc Remember my suggestion? Fix the label of Va

Belief Propagation on Cycles b;1 b;0 a;0 Va Vb c;1 d;1 c;0 d;0 Vd Vc Equivalent to a tree-structured problem

Belief Propagation on Cycles b;1 a;1 b;0 Va Vb c;1 d;1 c;0 d;0 Vd Vc Equivalent to a tree-structured problem

Belief Propagation on Cycles b;1 a;1 b;0 a;0 Va Vb c;1 d;1 c;0 d;0 Vd Vc This approach quickly becomes infeasible Choose the minimum energy solution

Vincent Algayres Algorithm b;0 a;0 Va Vb c;1 d;1 c;0 d;0 Vd Vc Compute zero cost paths from all labels of Va to all labels of Vd. Requires fixing Va.

Speed-Ups for Special Cases ab;ik= 0, if i = k = C, otherwise. Mab;k = mini { a;i + ab;ik} Felzenszwalb and Huttenlocher, 2004

Speed-Ups for Special Cases ab;ik= wab|i-k| Mab;k = mini { a;i + ab;ik} Felzenszwalb and Huttenlocher, 2004

Speed-Ups for Special Cases ab;ik= min{wab|i-k|, C} Mab;k = mini { a;i + ab;ik} Felzenszwalb and Huttenlocher, 2004

Speed-Ups for Special Cases ab;ik= min{wab(i-k)2, C} Mab;k = mini { a;i + ab;ik} Felzenszwalb and Huttenlocher, 2004

Lecture 3

Ising Model P(v) = exp{-ΣαθαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters for all Va V, li  L Ia;i(va) θa;i θab;ik for all (Va,Vb)  E, li, lk L Iab;ik(va,vb) Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Ising Model P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters for all Va V, li  L Ia;i(va) θa;i θab;ik for all (Va,Vb)  E, li, lk L Iab;ik(va,vb) Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk

Interactive Binary Segmentation Foreground histogram of RGB values FG Background histogram of RGB values BG ‘1’ indicates foreground and ‘0’ indicates background

Interactive Binary Segmentation More likely to be foreground than background

Interactive Binary Segmentation θa;0 proportional to -log(BG(da)) θa;1 proportional to -log(FG(da)) More likely to be background than foreground

Interactive Binary Segmentation More likely to belong to same label

Interactive Binary Segmentation θab;ik proportional to exp(-(da-db)2) if i ≠ k θab;ik = 0 if i = k Less likely to belong to same label

Outline • Minimum Cut Problem • Two-Label Submodular Energy Functions • Move-Making Algorithms

Directed Graph D = (N, A) 10 n1 n2 Two important restrictions 3 2 (1) Rational arc lengths (2) Positive arc lengths n3 n4 5

Cut D = (N, A) • Let N1 and N2 such that • N1 “union” N2 = N • N1 “intersection” N2 = Φ 10 n1 n2 3 2 • C is a set of arcs such that • (n1,n2)  A • n1  N1 • n2  N2 n3 n4 5 C is a cut in the digraph D

Cut D = (N, A) N1 What is C? 10 n1 n2 {(n1,n2),(n1,n4)} ? 3 2 {(n1,n4),(n3,n2)} ? ✓ n3 n4 {(n1,n4)} ? 5 N2

Cut D = (N, A) N1 N2 What is C? 10 n1 n2 {(n1,n2),(n1,n4),(n3,n2)} ? 3 2 ✓ {(n4,n3)} ? n3 n4 {(n1,n4),(n3,n2)} ? 5

Cut D = (N, A) N2 N1 What is C? 10 n1 n2 ✓ {(n1,n2),(n1,n4),(n3,n2)} ? 3 2 {(n3,n2)} ? n3 n4 {(n1,n4),(n3,n2)} ? 5

Cut D = (N, A) • Let N1 and N2 such that • N1 “union” N2 = N • N1 “intersection” N2 = Φ 10 n1 n2 3 2 • C is a set of arcs such that • (n1,n2)  A • n1  N1 • n2  N2 n3 n4 5 C is a cut in the digraph D

Weight of a Cut D = (N, A) 10 n1 n2 3 Sum of length of all arcs in C 2 n3 n4 5

Weight of a Cut D = (N, A) 10 n1 n2 3 w(C) = Σ(n1,n2) C l(n1,n2) 2 n3 n4 5

Weight of a Cut D = (N, A) N1 What is w(C)? 10 n1 n2 3 3 2 n3 n4 5 N2

Weight of a Cut D = (N, A) N1 N2 What is w(C)? 10 n1 n2 5 3 2 n3 n4 5

Weight of a Cut D = (N, A) N2 N1 What is w(C)? 10 n1 n2 15 3 2 n3 n4 5

st-Cut s D = (N, A) 1 2 A source “s” 10 n1 n2 A sink “t” 3 2 • C is a cut such that • s  N1 • t  N2 n3 n4 5 7 t 3 C is an st-cut

Weight of an st-Cut s D = (N, A) 1 2 10 n1 n2 3 w(C) = Σ(n1,n2) C l(n1,n2) 2 n3 n4 5 7 t 3

Weight of an st-Cut s D = (N, A) 1 2 What is w(C)? 10 n1 n2 3 3 2 n3 n4 5 7 t 3

Weight of an st-Cut s D = (N, A) 1 2 What is w(C)? 10 n1 n2 15 3 2 n3 n4 5 7 t 3

Minimum Cut Problem s D = (N, A) 1 2 Find a cut with the minimum weight !! 10 n1 n2 3 2 C* = argminC w(C) n3 n4 5 7 t 3

Solvers for the Minimum-Cut Problem Augmenting Path and Push-Relabel n: #nodes m: #arcs U: maximum arc length [Slide credit: Andrew Goldberg]

Probabilistic Inference Lecture 3