Local Linear Matrix Factorization for Document Modeling

Local Linear Matrix Factorization for Document Modeling • Lu Bai, JiafengGuo, YanyanLan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn

Outline • Introduction • Our approach • Experimental results • Conclusion

Introduction

Background • The low dimensional representations can be produced from decomposing the document-word matrix into low rank matrices • Preserving local geometric relations can improve the low dimensional representation • Smoothing the low dimensional representation • Improving the model’s generalization • Avoiding over fitting

Previous work A new low dimensional representation mining method by better exploiting the geometric relationship among documents

Our approach • Basic ideas

Local Linear Matrix Factorization(LLMF) • Factorizing the document-term matrix as NMF • ,are used for reducing over-fitting • Factorizing the matrix with neighbors • denotes the normalized document-word matrix • , avoids the bias of long documents • denotes the linear combination weight • weights the norm of • Picking document neighbors • Learning salient combination weights min min

Cont’ • Combining matrix factorization and local neighbor factorization , , • Final object function min

Graphic Model of LLMF

LLMF vs Others • Comparing models without geometric information • E.g. NMF, PLSA, LDA • LLMF smoothes document representation with its neighbors • Comparing models with geometric constraints • E.g. LapPLSA, LTM • LLMF is free of similarity measure and neighborhood threshold • LLMF is more robust in preserving local geometric structure in unbalanced data distribution

Model fitting • Estimating firstly • Not differentiable, because of the norm • OWL-QN • Estimating , • are bi-convex on • Coordinate gradient descent

Experimental Settings • Data set • 20news & la1(from Weka) • Word Stemming • Stop words removing

Cont’ • Baseline method • PLSA, LDA, NMF, LapPLSA • Parameter setting • Low Dimension • ,, for norm • for norm • Document classification • Libsvm, linear kernel • Training set : testing set = 3:2

Experimental Results

Cont’ • Document classification • LapPLSA and LLMF are better than NMF, PLSA, LDA • LLMF achieves highest accuracy than all baseline methods in both datasets • LLMF with different s is consistently better than pure NMF

Conclusion • Conclusions • We propose a novel method, namely LLMF for learning low dimensional representations of document with local linear constraints. • LLMF can better capture the rich geometric information among documents than those based on independent pairwise relationships. • Experiments on benchmark of 20news and la1 show the proposed approach can learn better semantic representations compared to other baseline methods • Future works • We would extend LLMF to paralleled and distributed settings • It is promising to apply LLMF in recommendation systems

References • D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichletallocation. JMLR, 3:2003, 2003. • D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. TKDE, 23(6):902–913,2011 • D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. CIKM ’08, 911–920,, NY, USA, 2008. ACM • T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. In Machine Learning, page 2001, 2001 • S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. KDD ’10, pages 653–662, New York, NY, USA, 2010. ACM

Thanks!! Q&A

Local Linear Matrix Factorization for Document Modeling

Local Linear Matrix Factorization for Document Modeling

Presentation Transcript

Matrix Factorization and Collaborative Filtering

Non-Negative Matrix Factorization

Non-negative Matrix Factorization

Shifted Non-negative Matrix Factorization

Linear Systems LU Factorization

Matrix Factorization

Bayesian Nonparametric Matrix Factorization for Recorded Music

Initialization enhancer for non-negative matrix factorization

Non Negative Matrix Factorization

Stochastic Matrix Factorization

Direct Robust Matrix Factorization

Linear Least Squares QR Factorization

Robust Nonnegative Matrix Factorization

Matrix Factorization and its applications

Bayesian Nonparametric Matrix Factorization for Recorded Music

Matrix Factorization

Probabilistic Sparse Matrix Factorization

Collaborative Filtering Matrix Factorization Approach

Matrix Factorization with Unknown Noise

Matrix Factorization

Matrix Factorization via SGD

Principled Regularization for Probabilistic Matrix Factorization