1 / 18

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization. Hugo Van Hamme Reporter: Yi-Ting Wang 2013/3/26. Outline. Introduction NMF formulation The Diagonalized Newton Algorithm for KLD-NMF Experiments Conclusions. Introduction.

temira
Télécharger la présentation

The Diagonalized Newton Algorithm for Non-negative Matrix Factorization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Diagonalized Newton Algorithm for Non-negative Matrix Factorization Hugo Van Hamme Reporter: Yi-Ting Wang 2013/3/26

  2. Outline • Introduction • NMF formulation • The Diagonalized Newton Algorithm for KLD-NMF • Experiments • Conclusions

  3. Introduction • Non-negative matrix factorization(NMF) denotes, • In this paper, the metric to measure the closeness of reconstruction to its target is measured by their Kullback-Leibler divergence: (1)

  4. Introduction • The multiplicative updates(MU) algorithm solves exactly this problem in an iterative manner. • Its simplicity and the availability of many implementations make it a popular algorithm to date to solve NMF problems. • There are some drawbacks. • Firstly, it only converges locally and is not guaranteed to yield the global minimum of the cost function. It is hence sensitive to the choice of the initial guesses for and . • Secondly, MU is very slow to converge. The goal of this paper is to speed up the convergence while the local convergence property is retained.

  5. Introduction • The resulting Diagonalized Newton Algorithm (DNA) is proposed showing faster convergence while the implementation remains simple and suitable for high-rank problems. • http://zh.wikipedia.org/wiki/File:NewtonIteration_Ani.gif

  6. NMF formulation • To induce sparsity on the matrix factors, the KL-divergence is often regularized, i.e. one seeks to minimize: (2) • Minimizing (2) can be achieved by alternating updates of and for which cost is non-increasing. The updates for this form of block coordinate descent are: (3) (4)

  7. NMF formulation • Let denote any column of and let denote the corresponding column of , then the following is the core minimization problem to be considered: (5)where 1 denotes a vector of ones of appropriate length. The solution of (5) should satisfy the KKT conditions, i.e. for all with

  8. NMF formulation • If , the partial derivative is positive. • Hence the product of and the partial derivative is always zero for a solution of (5), i.e. for (6) • Since -columns with all-zero do not contribute to , it can be assumed that column sums of are not non-zero, so the above can be recast as:

  9. NMF formulation • Where . To facilitate the derivations below, the following notations are introduced: (7) • Which are functions of via . The KKT conditions are hence recast as for (8) • Finally, summing (6) over yields (9) • Which is satisfied for and guess by renormalizing: (10)

  10. Multiplicative updates • For KL-divergence, MU are identical to a fixed point update of (6) (11) • Update (11) has two fixed points:and . • In the former case, the KKT conditions imply that is negative.

  11. Newton updates • To find the stationary points of (2), equations (8) need to be solved for . Newton’s update then states: with (12) • Applied to equations (8): (13)

  12. Newton updates • To avoid the matrix inversion in update (12), the last term in (13) is diagonalized, which is equivalent to solving the r-th equation in (8) for with all other components fixed. With (14) • Which is always positive, and element-wise Newton update for is obtained: (15)

  13. Step size limitation • To respect nonnegativity and to avoid the sigularity, it is bounded below by a function with the same local behavior around zero: (16) • Hence, if , the following update is used: (17)

  14. Non-increase of the cost • Despite section 2.3 (size limitation), the divergence can still increase. • A very safe option is to compute the EM update. • If the EM update is be better, the Newton update is rejected and the EM update is taken instead. • This will guarantee non-increase of the cost function. • The computational cost of this operation is dominated by evaluating the KL-divergence, not in computing the update itself.

  15. The Diagonalized Newton Algorithm for KLD-NMF

  16. Experiments-Dense data matrices

  17. Experiments-Sparse data matrices

  18. Conclusions • Depending on the case and matrix sizes, DNA iterations are 2 to 3 times slower than MU iterations. • In most cases, the diagonal approximation is good enough such that faster convergence is observed and a net gain results. • Since Newton updates can in general not ensure monotonic decrease of the cost function, the step size was controlled with a brute force strategy of falling back to MU in case the cost is increased. • More refined step damping methods could speed up DNA by avoiding evaluations of the cost function, which is next on the research agenda.

More Related