Distributed Parameter Estimation via Pseudo-likelihood … Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Non-zero elements zero elements Choosing the Optimal Weights (cont.) Motivation ADMM for Joint Optimization Consensus • Optimal weights for linear consensus : • If corr(siα, sjα) = 0for i ≠ j, then the optimal weights of is • Optimal weights for max consensus is • If corr(siα, sjα) = 1for i ≠ j, then max consensus with achieves the performance of the best linear consensus. • Graphical models in exponential family form: • Task:distributed algorithms for estimating parameters • given i.i.d. data, . • Example: wireless sensor network as MRF, • Limited computational power • and memory on local sensors. • High communication cost. • Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation • #P-complete in general graphs • Approximations and bounds are needed Joint optimization consensus can be solved distributedlyvia alternating direction method of multipliers (ADMM): • Let ; reform the optimization, Sum of rows of Vα-1 • Augmented Lagrangian: • ADMM: M-Estimators • ‘’Iterative” linear consensus • See similar algorithm in Wiesel & Hero 2012 • M-estimator: • Asymptotic consistency and normality: if • Intuition: • Maximum likelihood estimator (MLE): • Maximum Pseudo-likelihood (PL) estimator (MPLE): • “Anytime” property: once is initialized to be asymp. consistent (and ), then remains asymp. consistent at every iteration. // recall that max consensus are special cases of linear consensus. (Sandwich formula) Experiments Fisher information: , Hessian: . Choosing the Optimal Weights • Binary Two-node Toy Model: . • Estimating (true ) as and (both known) are varied. • Matrix Consensus (introducing for theoretically purpose): • Matrix Consensus reduces to linear consensus when Wi are diagonal matrices. • Asymptotic covariance of : • Joint optimization consensus is asymptotically equivalent to matrix consensus with Wi=Hi: • The optimal weights should minimize the asymptotic mean square error (MSE): • The optimal weights of matrix consensus are given by • requires calculating partition function, NP-hard. • Star Graphs (unbalanced degrees, max consensus preferred): • Important: each term of PL only involves local data and parameters. A Distributed Paradigm • Perform local estimators on sensor nodes: • Combine the local estimations: • 4X4 Grid (balanced degrees, joint MPLE preferred): • Joint Optimization Consensus (joint MPLE): • Linear Consensus: • Max Consensus: • Max consensus is a special linear consensus. • Under mild conditions, all these consensus estimators are asymp. consistent if the local estimators are asymp. Consistent. ADMM Iteration • Large-scale Models (100 nodes, similar trends as small models): • quadratic programming, solvable, but requires global calculation. • If corr(si, sj) = 0 for i ≠ j, then (joint MPLE, Wi =Hi) achieves the optimum asymptotic MSE.