CoNMF: Exploiting User Comments for Clustering Web2.0 Items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 Email: xiangnan@comp.nus.edu.sg School of Computing National University of Singapore

Introduction • Motivations: • Users comment on items based on their own interests. • Most users’ interests are limited. • The categories of items can be inferred from the comments. • Proposed problem: • Clustering items by exploiting user comments. • Applications: • Improve search diversity. • Automatic tag generation from comments. • Group-based recommendation WING, NUS

Challenges • Traditional solution: • Represent items as a feature space. • Apply any clustering algorithm, e.g. k-means. • Key challenges: • Items have heterogeneous features: • Own features (e.g. words for articles, pixels for images) • Comments • Usernames • Textual contents • Simply concatenate all features does not preform well. • How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)? WING, NUS

Proposed solution • Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering… WING, NUS

NMF (Non-negative Matrix Factorization) • Factorize data matrix V (#doc×#words) as: • where W is #doc×k and H is k×#words, and each entry is non-negative • Goal is minimizing the objective function: • where || || denotes the Frobenius norm • Alternating optimization: • With Lagrange multipliers, differentiate on W and H respectively. Local optimum, not global! WING, NUS

Characteristics of NMF • Matrix Factorization with a non-negative constraint • Reduce the dimension of the data; derive the latent space • Difference with SVD(LSI): • Theoretically proved suitable for clustering (Chis et al. 2005) • Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)

Extensions of NMF • Relationships with other clustering algorithms: • K-means: Orthogonal NMF = K-means • PLSI: KL-Divergence NMF = PLSI • Spectral clustering • Extensions: • Tri-factor of NMF( V = WSH ) (Ding et al. 2006) • NMF with sparsity constraints (Hoyer 2004) • NMF with graph regularization (Cai et al. 2011) • However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013) • My proposal: • Extend NMF to support multi-view clustering WING, NUS

Proposed solution - CoNMF • Idea: • Couple the factorization process of NMF • Example: • Single NMF: • Factorization equation： • Objective function: • Constraints: all entries of W and H are non-negative. • - 2-view CoNMF: • Factorization equation: • Objective function: WING, NUS

CoNMF Framework • Mutual-based: • Point-wise: • Cluster-wise: • Coupling the factorization process of multiple matrices(i.e. views) via regularization. • Objective function: • Similar alternating optimization with Lagrange multipliers can solve it. • Different options of regularization: • Centroid-based (Liu et al. 2013): WING, NUS

Experiments • Last.fm dataset: • 3-views: • Ground-truth: • Music type of each artist provided by Last.fm • Evaluation metrics: • Accuracy and F1 • Average performance of 20 runs. WING, NUS

Statistics of datasets Statistics of #items/user Statistics of #clusters/user P(T<=3) = 0.6229 P(T<=5) = 0.8474 P(T<=10) = 0.9854 Verify our assumption: each user usually comments on limited music types. WING, NUS

Experimental results (Accuracy) 1. Users>Comm.>Desc., while combined is best. 2. SVD performs badly on users (non-textual). 3. Users>Comm.>Desc., while combined does worse. 4. Initialization is important for NMF. 5. CoNMF-point performs best. 6. Other two state-of-the-art baselines. WING, NUS

Experimental results (F1) WING, NUS

Conclusions • Comments benefit clustering. • Mining different views from the comments is important: • The two views (commenting words and users) contribute differently for clustering. • For this Last.fm dataset, users is more useful. • Combining all views works best. • For NMF-based methods, initialization is important. WING, NUS

Ongoing • More experiments on other datasets. • Improve the CoNMF framework through adding the sparseness constraints. • The influence of normalization on CoNMF. WING, NUS

Thanks! QA? WING, NUS

References(I) • Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. SIAM Data Mining Conf 2005. • Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003 • Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006 • Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004 • Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011 • Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13) WING, NUS

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Presentation Transcript

Spectral Clustering

Windows Server 2008 R2 and Microsoft SQL Server 2008: Failover Clustering Implementations

Guest vs. Host Clustering: What ? Why? When?

EM Algorithm: Expectation Maximazation Clustering Algorithm book: “ DataMining, Morgan Kaufmann, Frank ”

Canonical LR Parsing Tables

Software Clustering Using Bunch

Clustering in Ad hoc and Sensor Networks

The 82 nd Cameth :

Software Clustering Using Bunch

Lecture #11

Architecting and Exploiting Asymmetry in Multi-Core Architectures

Business User Group

Semi-Supervised Clustering

Warehouse Concepts

Understanding and Exploiting Flash ActionScript Vulnerabilities --

Algorithms Exploiting the Chain Structure of Proteins

Techniques For Exploiting Unlabeled Data

Base Object Model v0.12 Adjudication

WEB BAR 2004 Advanced Retrieval and Web Mining

Clustering with Application to Fast Object Search

Power Iteration Clustering