Cluster Variation Method for Graphs with Discrete Nodes

Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley

Outline • Introduction • The exponential family distributions • Variational approximation • Gibbs sampling • Undirected graphs • New approach to Gibbs sampling • Belief propagation revisited • The FNA • Directed graphs if time allows 

Estimating marginals of discrete random variables P(x|Q)=exp[QT f(x)-A(Q)] The features In a quadratic model: fij=xixj f i=xi Log partition function The exponential family form

Some definitions Marginalizing over the mth node mm=Sxi\xmP(x| ) x set of random variables • set of parameters mmn=Sxi\{xm xn}P(x| ) xi={0,1} binary

What set is this??? Variational presentation of the log-partition function A(Q)=ln Sxexp[QT f(x)] It is a convex function (A*)*= A A*(m)=supqR|I|{mTq- A(Q)} A(Q)=supmM{mTq- A*(m)}

Dual parameters = marginals m=Eq[f(x)] For discrete random variables, M, is the marginal ploytope defined by M:={m R|I| | p(·) s.t. Sxif(x)p(x)=m} CV Approximations: pseudo marginals, m

Mean field Factorizing the joint probability distribution P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Only one Lagrange multiplier for each node but found by iterative algorithm – the numeric results might not be in the approx. M The objective function is not concave Pseudomarginals set is convex lies withinM

Mean field (cont.) P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Appr. the canonical parm. + Padding with zeros A*(m)=supqR|I|{mTq- Ai(qi)} For pairwise and single nodes iter. qi = qi + qijmi/2 A(Q)=supmM{mTq- A*(m)} A(Q)=supmM{miqi+mimjqij - milnmi - (1-mi)ln(1-mi)} The objective function is not concave Pseudomarginals set is convex lies within M

Gibbs Sampling • Local updates according to the conditional probability, p(xi=1)= xN(i)p(xN(i))(jN(i)ijxj) (y)=exp(y)/[1+exp(y)] • The measure converges to the Gibbs distribution – the exponential form • All moments are calculated using samples from the equilibrium

p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xi=1,xj=1)=… p(xi=1,xj=1,xk=1)=… Gibbs Sampling – dual space pt+1(xi=1)=(1-1/N)pt(xi=1)+ 1/NxN(i)pt(xN(i))(jN(i)ijxj) A set of 2N fixed point equations yields exact relations between marginals

Gibbs Sampling and Bethe app. p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~xijN(i)p(xixj)/p(xi)|N(i)|-1 mi=xN(i) xi jN(I)p(xixj)/p(xi)|N(i)|-1(jN(i)ijxj) mij=f(mi, mij’ij’) j’ stands for all neighbors of i and j.

Gibbs Sampling and the Factorized Neighbors Algorithm p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~jN(i)p(xj) The FNA: mi= xN(i)jN(i)p(xi)(jN(i)ijxj)

The F. N. A. • The approximation is less restricted than MF • The algorithm is not exact on trees • The approximation is more restricted than Bethe Some comparisons for graphs with N nodes M edges and up to n neighbors. Time complexity: MF: O(N) Bethe: O(M) FNA: O(Nexp(n)) Space complexity: MF: O(N) Bethe: O(M) FNA: O(N)

The F. N. A. results on a grid

Pseudomarginals Vs. Exact

Errors i=(mi- mi)2/2

Directed Gibbs sampling and the parents factored app. • As soon as a node is chosen all its descendents are updated • The local updates are according to the parents’ current state. • Factorized parents assumption p(x(i))=j(i) p(xj) p(xi=1)=x(i)j(i) p(xj)(j(i)ijxj)

Directed Gibbs Sampling – dual space pt+1(xi=1)=(1-i)pt(xi=1)+ x(i)[1/Npt(x(i))+ (1/N-i) pt+1(x (i))](j(i)ijxj) p(xi=1)=x(i)p(x(i))(j(i)ijxj) p(xi=1,xj=1)=x(i)\j, x(j)p(x(i)\j, x(j)) (k(j)jkxk) (k(i)\jikxk+ ij)

Back to the CVM approach P(x|m)=Pi P(xi,x(i)|mi, (i))/… Padding with zeros to a higher space A*(m)=entropy of some approx. canonical set For pairwise and single nodes iterations: qi = qi qij =qij qi, (i) =0 A(Q)=supmM{mTq- A*(m)} The objective function is concave Pseudomarginals set is not necessarily within M

Directed lattice

Numerical results parents fact. Evidence: x17=x18=x19=1 FPA makes use of the exact results in the evidence-free graph

Cluster Variation Method for Graphs with Discrete Nodes

Cluster Variation Method for Graphs with Discrete Nodes

Presentation Transcript

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Variation in Number of Nodes

Direct Variation

Direct Variation

Direct Variation

Direct Variation

Direct Variation