1 / 18

Local Linear Matrix Factorization for Document Modeling

Local Linear Matrix Factorization for Document Modeling. Lu Bai, Jiafeng Guo , Yanyan Lan , Xueqi Cheng. Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn. Outline. Introduction Our approach Experimental results Conclusion. I ntroduction.

josee
Télécharger la présentation

Local Linear Matrix Factorization for Document Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Linear Matrix Factorization for Document Modeling • Lu Bai, JiafengGuo, YanyanLan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn

  2. Outline • Introduction • Our approach • Experimental results • Conclusion

  3. Introduction

  4. Background • The low dimensional representations can be produced from decomposing the document-word matrix into low rank matrices • Preserving local geometric relations can improve the low dimensional representation • Smoothing the low dimensional representation • Improving the model’s generalization • Avoiding over fitting

  5. Previous work A new low dimensional representation mining method by better exploiting the geometric relationship among documents

  6. Our approach • Basic ideas

  7. Local Linear Matrix Factorization(LLMF) • Factorizing the document-term matrix as NMF • ,are used for reducing over-fitting • Factorizing the matrix with neighbors • denotes the normalized document-word matrix • , avoids the bias of long documents • denotes the linear combination weight • weights the norm of • Picking document neighbors • Learning salient combination weights min min

  8. Cont’ • Combining matrix factorization and local neighbor factorization , , • Final object function min

  9. Graphic Model of LLMF

  10. LLMF vs Others • Comparing models without geometric information • E.g. NMF, PLSA, LDA • LLMF smoothes document representation with its neighbors • Comparing models with geometric constraints • E.g. LapPLSA, LTM • LLMF is free of similarity measure and neighborhood threshold • LLMF is more robust in preserving local geometric structure in unbalanced data distribution

  11. Model fitting • Estimating firstly • Not differentiable, because of the norm • OWL-QN • Estimating , • are bi-convex on • Coordinate gradient descent

  12. Experimental Settings • Data set • 20news & la1(from Weka) • Word Stemming • Stop words removing

  13. Cont’ • Baseline method • PLSA, LDA, NMF, LapPLSA • Parameter setting • Low Dimension • ,, for norm • for norm • Document classification • Libsvm, linear kernel • Training set : testing set = 3:2

  14. Experimental Results

  15. Cont’ • Document classification • LapPLSA and LLMF are better than NMF, PLSA, LDA • LLMF achieves highest accuracy than all baseline methods in both datasets • LLMF with different s is consistently better than pure NMF

  16. Conclusion • Conclusions • We propose a novel method, namely LLMF for learning low dimensional representations of document with local linear constraints. • LLMF can better capture the rich geometric information among documents than those based on independent pairwise relationships. • Experiments on benchmark of 20news and la1 show the proposed approach can learn better semantic representations compared to other baseline methods • Future works • We would extend LLMF to paralleled and distributed settings • It is promising to apply LLMF in recommendation systems

  17. References • D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichletallocation. JMLR, 3:2003, 2003. • D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. TKDE, 23(6):902–913,2011 • D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. CIKM ’08, 911–920,, NY, USA, 2008. ACM • T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. In Machine Learning, page 2001, 2001 • S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. KDD ’10, pages 653–662, New York, NY, USA, 2010. ACM

  18. Thanks!! Q&A

More Related