1 / 17

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items. Presenter: He Xiangnan 28 June 2013 Email: xiangnan@comp.nus.edu.sg School of Computing National University of Singapore. Introduction. Motivations: Users comment on items based on their own interests.

minya
Télécharger la présentation

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 Email: xiangnan@comp.nus.edu.sg School of Computing National University of Singapore

  2. Introduction • Motivations: • Users comment on items based on their own interests. • Most users’ interests are limited. • The categories of items can be inferred from the comments. • Proposed problem: • Clustering items by exploiting user comments. • Applications: • Improve search diversity. • Automatic tag generation from comments. • Group-based recommendation WING, NUS

  3. Challenges • Traditional solution: • Represent items as a feature space. • Apply any clustering algorithm, e.g. k-means. • Key challenges: • Items have heterogeneous features: • Own features (e.g. words for articles, pixels for images) • Comments • Usernames • Textual contents • Simply concatenate all features does not preform well. • How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)? WING, NUS

  4. Proposed solution • Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering… WING, NUS

  5. NMF (Non-negative Matrix Factorization) • Factorize data matrix V (#doc×#words) as: • where W is #doc×k and H is k×#words, and each entry is non-negative • Goal is minimizing the objective function: • where || || denotes the Frobenius norm • Alternating optimization: • With Lagrange multipliers, differentiate on W and H respectively. Local optimum, not global! WING, NUS

  6. Characteristics of NMF • Matrix Factorization with a non-negative constraint • Reduce the dimension of the data; derive the latent space • Difference with SVD(LSI): • Theoretically proved suitable for clustering (Chis et al. 2005) • Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)

  7. Extensions of NMF • Relationships with other clustering algorithms: • K-means: Orthogonal NMF = K-means • PLSI: KL-Divergence NMF = PLSI • Spectral clustering • Extensions: • Tri-factor of NMF( V = WSH ) (Ding et al. 2006) • NMF with sparsity constraints (Hoyer 2004) • NMF with graph regularization (Cai et al. 2011) • However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013) • My proposal: • Extend NMF to support multi-view clustering WING, NUS

  8. Proposed solution - CoNMF • Idea: • Couple the factorization process of NMF • Example: • Single NMF: • Factorization equation: • Objective function: • Constraints: all entries of W and H are non-negative. • - 2-view CoNMF: • Factorization equation: • Objective function: WING, NUS

  9. CoNMF Framework • Mutual-based: • Point-wise: • Cluster-wise: • Coupling the factorization process of multiple matrices(i.e. views) via regularization. • Objective function: • Similar alternating optimization with Lagrange multipliers can solve it. • Different options of regularization: • Centroid-based (Liu et al. 2013): WING, NUS

  10. Experiments • Last.fm dataset: • 3-views: • Ground-truth: • Music type of each artist provided by Last.fm • Evaluation metrics: • Accuracy and F1 • Average performance of 20 runs. WING, NUS

  11. Statistics of datasets Statistics of #items/user Statistics of #clusters/user P(T<=3) = 0.6229 P(T<=5) = 0.8474 P(T<=10) = 0.9854 Verify our assumption: each user usually comments on limited music types. WING, NUS

  12. Experimental results (Accuracy) 1. Users>Comm.>Desc., while combined is best. 2. SVD performs badly on users (non-textual). 3. Users>Comm.>Desc., while combined does worse. 4. Initialization is important for NMF. 5. CoNMF-point performs best. 6. Other two state-of-the-art baselines. WING, NUS

  13. Experimental results (F1) WING, NUS

  14. Conclusions • Comments benefit clustering. • Mining different views from the comments is important: • The two views (commenting words and users) contribute differently for clustering. • For this Last.fm dataset, users is more useful. • Combining all views works best. • For NMF-based methods, initialization is important. WING, NUS

  15. Ongoing • More experiments on other datasets. • Improve the CoNMF framework through adding the sparseness constraints. • The influence of normalization on CoNMF. WING, NUS

  16. Thanks! QA? WING, NUS

  17. References(I) • Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In  Proc. SIAM Data Mining Conf 2005. • Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003 • Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006 • Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004 • Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011  • Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13) WING, NUS

More Related