1 / 13

Direct Robust Matrix Factorization

Direct Robust Matrix Factorization. Liang Xiong , Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University. Matrix Factorization. Extremely useful… Assumes the data matrix is of low-rank. PCA/SVD, NMF, Collaborative Filtering…

harmon
Télécharger la présentation

Direct Robust Matrix Factorization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Direct Robust Matrix Factorization Liang Xiong, Xi Chen, Jeff SchneiderPresented by xxx School of Computer Science Carnegie Mellon University

  2. Matrix Factorization • Extremely useful… • Assumes the data matrix is of low-rank. • PCA/SVD, NMF, Collaborative Filtering… • Simple, effective, and scalable. • For Anomaly Detection • Assumption: the normal data is of low-rank, and anomalies are poorly approximated by the factorization. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  3. Robustness Issue • Usually not robust (sensitive to outliers) • Because of the L2 (Frobenius) measure they use. • For anomaly detection, of course we have outliers. Minimize the approximation error Low rank DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  4. Why outliers matter • Simulation • We use SVD to find the first basis of 10 sine signals. • To make it more fun, let us turn one point of one signal into a spike (the outlier). Input signals Output basis No outlier Moderate outlier Wild outlier Cool Disturbed Totally lost  DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  5. Direct Robust Matrix Factorization (DRMF) • Throw outliers out of the factorization, and problem solved! • Mathematically, this is DRMF: • : number of non-zeros in S. “Trash can” for outliers There should be only a small number of outliers. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  6. DRMF Algorithm • Input: Data X. • Output: Low-rank L; Outliers S. • Iterate (block coordinate descent): • Let C = X – S. Do rank-K SVD: L = SVD(C, K). • Let E = X – L. Do thresholding: • t: the e-th largest elements in {|Eij|}. • That’s it! Everyone could try at home. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  7. Related Work • Nuclear norm minimization (NNM) • Effective methods with nice theoretical properties from compressive sensing. • NNM is the convex relaxation of DRMF: • A parallel work GoDec by Zhou et al. found in ICML’11. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  8. Pros & Cons • Pros: • No compromise/relaxation => High quality • Efficient • Easy to implement and use • Cons: • Difficult theory • Because of the rank and the L0 norm… • Non-convex. • Local minima exist. But can be greatly mitigated if initialized by its convex version, NNM. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  9. Highly Extensible • Structured Outliers • Outlier rows instead of entries? Just use structured measurements. • Sparse Input / Missing data • Useful for Recommendation, Matrix Completion. • Non-Negativity like in NMF • Still readily solvable with the constraints. • For large-scale problems. • Use approximate SVD solvers. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  10. Simulation Study • Factorize noisy low-rank matrices to find entry outliers. • SVD: plain SVD.RPCA, SPCP: two representative NNM methods. Error of recovering normal entries Detection rate of outlier entries. Running time (log-scale) DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  11. Simulation Study • Sensitivity to outliers • We examine the recovering errors when the outlier amplitude grows. • Noiseless case. All assumptions by RPCA hold. DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  12. Find Stranger Digits • USPS dataset is used. We mix a few ‘7’s into many ‘1’’s, and then ask DRMF to find out those ‘7’s. Unsupervised. • Treat each digit as a row in the matrix. • Rank the digits by reconstruction errors. • Use the structured extension of DRMF: row outliers. • Resulting ranked list: DRMF: Liang Xiong, Xi Chen, Jeff Schneider

  13. Conclusion • DRMF is a direct and intuitive solution to the robust factorization problem. • Easy to implement and use. • Highly extensible. • Good empirical performance. Please direct questions to Liang Xiong (lxiong@cs.cmu.edu) DRMF: Liang Xiong, Xi Chen, Jeff Schneider

More Related