210 likes | 316 Vues
This paper discusses a direct approach to the Cluster Variation Method (CVM) in graphs with discrete nodes. It covers the exponential family distributions, variational approximation, Gibbs sampling, handling undirected graphs, a new approach to Gibbs sampling, and revisiting belief propagation. The paper also dives into estimating marginals of discrete random variables using the CVM framework, discussing variational presentations of log-partition functions and CV approximations. It explores Gibbs Sampling, the Factorized Neighbors Algorithm, the implications of directed graphs, and the efficiency of different algorithms for various graph structures. The study concludes with numerical results and comparisons demonstrating the efficacy of the presented methods.
E N D
Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley
Outline • Introduction • The exponential family distributions • Variational approximation • Gibbs sampling • Undirected graphs • New approach to Gibbs sampling • Belief propagation revisited • The FNA • Directed graphs if time allows
Estimating marginals of discrete random variables P(x|Q)=exp[QT f(x)-A(Q)] The features In a quadratic model: fij=xixj f i=xi Log partition function The exponential family form
Some definitions Marginalizing over the mth node mm=Sxi\xmP(x| ) x set of random variables • set of parameters mmn=Sxi\{xm xn}P(x| ) xi={0,1} binary
What set is this??? Variational presentation of the log-partition function A(Q)=ln Sxexp[QT f(x)] It is a convex function (A*)*= A A*(m)=supqR|I|{mTq- A(Q)} A(Q)=supmM{mTq- A*(m)}
Dual parameters = marginals m=Eq[f(x)] For discrete random variables, M, is the marginal ploytope defined by M:={m R|I| | p(·) s.t. Sxif(x)p(x)=m} CV Approximations: pseudo marginals, m
Mean field Factorizing the joint probability distribution P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Only one Lagrange multiplier for each node but found by iterative algorithm – the numeric results might not be in the approx. M The objective function is not concave Pseudomarginals set is convex lies withinM
Mean field (cont.) P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Appr. the canonical parm. + Padding with zeros A*(m)=supqR|I|{mTq- Ai(qi)} For pairwise and single nodes iter. qi = qi + qijmi/2 A(Q)=supmM{mTq- A*(m)} A(Q)=supmM{miqi+mimjqij - milnmi - (1-mi)ln(1-mi)} The objective function is not concave Pseudomarginals set is convex lies within M
Gibbs Sampling • Local updates according to the conditional probability, p(xi=1)= xN(i)p(xN(i))(jN(i)ijxj) (y)=exp(y)/[1+exp(y)] • The measure converges to the Gibbs distribution – the exponential form • All moments are calculated using samples from the equilibrium
p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xi=1,xj=1)=… p(xi=1,xj=1,xk=1)=… Gibbs Sampling – dual space pt+1(xi=1)=(1-1/N)pt(xi=1)+ 1/NxN(i)pt(xN(i))(jN(i)ijxj) A set of 2N fixed point equations yields exact relations between marginals
Gibbs Sampling and Bethe app. p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~xijN(i)p(xixj)/p(xi)|N(i)|-1 mi=xN(i) xi jN(I)p(xixj)/p(xi)|N(i)|-1(jN(i)ijxj) mij=f(mi, mij’ij’) j’ stands for all neighbors of i and j.
Gibbs Sampling and the Factorized Neighbors Algorithm p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~jN(i)p(xj) The FNA: mi= xN(i)jN(i)p(xi)(jN(i)ijxj)
The F. N. A. • The approximation is less restricted than MF • The algorithm is not exact on trees • The approximation is more restricted than Bethe Some comparisons for graphs with N nodes M edges and up to n neighbors. Time complexity: MF: O(N) Bethe: O(M) FNA: O(Nexp(n)) Space complexity: MF: O(N) Bethe: O(M) FNA: O(N)
Directed Gibbs sampling and the parents factored app. • As soon as a node is chosen all its descendents are updated • The local updates are according to the parents’ current state. • Factorized parents assumption p(x(i))=j(i) p(xj) p(xi=1)=x(i)j(i) p(xj)(j(i)ijxj)
Directed Gibbs Sampling – dual space pt+1(xi=1)=(1-i)pt(xi=1)+ x(i)[1/Npt(x(i))+ (1/N-i) pt+1(x (i))](j(i)ijxj) p(xi=1)=x(i)p(x(i))(j(i)ijxj) p(xi=1,xj=1)=x(i)\j, x(j)p(x(i)\j, x(j)) (k(j)jkxk) (k(i)\jikxk+ ij)
Back to the CVM approach P(x|m)=Pi P(xi,x(i)|mi, (i))/… Padding with zeros to a higher space A*(m)=entropy of some approx. canonical set For pairwise and single nodes iterations: qi = qi qij =qij qi, (i) =0 A(Q)=supmM{mTq- A*(m)} The objective function is concave Pseudomarginals set is not necessarily within M
Numerical results parents fact. Evidence: x17=x18=x19=1 FPA makes use of the exact results in the evidence-free graph