1 / 25

Jaegul Choo 1* , Changhyun Lee 1 , Chandan K. Reddy 2 , and Haesun Park 1

UTOPIAN: U ser-Driven Top ic Modeling Based on I nter a ctive N onnegative Matrix Factorization. Jaegul Choo 1* , Changhyun Lee 1 , Chandan K. Reddy 2 , and Haesun Park 1 1 Georgia Institute of Technology, 2 Wayne State University *e-mail: jaegul.choo@cc.gatech.edu. Intro: Topic Modeling.

raisie
Télécharger la présentation

Jaegul Choo 1* , Changhyun Lee 1 , Chandan K. Reddy 2 , and Haesun Park 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1 1Georgia Institute of Technology, 2Wayne State University *e-mail: jaegul.choo@cc.gatech.edu

  2. Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4 brain evolve dna genetic gene nerve neuron life organism

  3. Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4 Topic 1 Topic 2 Topic 3 Topic: a distribution over keywords brain evolve dna genetic gene nerve neuron life organism

  4. Intro: Topic Modeling Topic: a distribution over keywords Document 1 Document 2 Document 3 Document 4 Document : a distribution over topic Topic 1 Topic 2 Topic 3 brain evolve dna genetic gene nerve neuron life organism

  5. Latent Dirichlet Allocation (LDA) in Visual Analytics • LDA has been widely used in visual analytics. • TIARA [Wei et al. KDD10], iVisClustering [Lee et al. EuroVis12], ParallelTopics [Dou et al. VAST12], TopicViz [Eisenstein et al. CHI-WIP12], … *Image courtesy of original papers.

  6. Overview of Our Work Keyword-induced topic creation Topic merging • Proposes nonnegative matrix factorization (NMF) for topic modeling. • Highlights advantages of NMF over LDA in visual analytics. • Presents UTOPIAN, an NMF-based interactive topic modeling system. Doc-induced topic creation Topic splitting

  7. What is Nonnegative Matrix Factorization?

  8. Nonnegative Matrix Factorization (NMF) Lower-rank approximation with nonnegativity constraints Why nonnegativity? • Easy interpretation and semantically meaningful output Algorithm • Alternating nonnegativity-constrained least squares [Kim et al., 2008] H • min || A – WH ||F W>=0, H>=0 ~ = A W

  9. NMF as Topic Modeling H H ~ = A W W Topic: a distribution over keywords Document 1 Document 2 Document 3 Document 4 Document : a distribution over topic Topic 1 Topic 2 Topic 3 brain evolve dna genetic gene nerve neuron life organism

  10. Why NMF in Visual Analytics?

  11. Advantages of NMF in Visual Analytics • Reliable algorithmic behaviors • Flexible support for user interactions

  12. NMF vs. LDAConsistency from Multiple Runs Documents’ topical membership changes among 10 runs InfoVis/VAST paper data set 20 newsgroup data set

  13. NMF vs. LDAEmpirical Convergence Documents’ topical membership changes between iterations InfoVis/VAST paper data set 48 seconds 10 minutes NMF LDA

  14. NMF vs. LDATopic Summary (Top Keywords) InfoVis/VAST paper data set • Topics are more consistent in NMF than in LDA. • Topic quality is comparable between NMF and LDA.

  15. Advantages of NMF in Visual Analytics • Reliable algorithmic behaviors • Flexible support for user interactions

  16. Weakly Supervised NMF [Choo et al., DMKD, accepted with rev.] min ||A – WH ||F2+ α||(W – Wr)MW ||F2 + β||MH(H – DHHr) ||F2 W>=0, H>=0 • Wr, Hr: reference matrices for W and H • MW, MH: diagonal matrices for weighting/masking columns/rows of W and H • Provides flexible yet intuitive means for user interaction. • Maintains the same computational complexity as original NMF.

  17. UTOPIAN: User-Driven Topic Modeling Based on Interactive NMF Topic merging Keyword-induced topic creation Doc-induced topic creation Topic splitting

  18. UTOPIAN Overview Keyword-induced topic creation Topic merging Supervised t-distributed stochastic neighbor embedding (t-SNE) User interactions supported • Keyword refinement • Topic merging/splitting • Keyword-/document-induced topic creation Real-time interaction via PIVE (Per-Iteration Visualization Environment) Doc-induced topic creation Topic splitting

  19. Supervised t-SNE Original t-SNE • Documents are often too noisy to work with. Supervised t-SNE • d(xi, xj) ← α•d(xi, xj) if xi and xj belongs to the same topic cluster.

  20. PIVE (Per-Iteration Visualization Environment) for Real-time Interaction[Choo et al., under revision] Standard approach PIVE approach

  21. Demo Videohttp://tinyurl.com/UTOPIAN2013

  22. Usage Scenario: Hyundai Genesis Review Data Initial result After interaction

  23. Summary • Presented UTOPIAN, a User-Driven Topic Modeling based on Interactive NMF. • Highlighted the advantages of NMF over LDA in visual analytics. • Reliable algorithmic behaviors • Consistency from multiple runs • Early empirical convergence • Flexible support for user interactions • Keyword refinement • Topic merging/splitting • Keyword-/document-induced topic creation

  24. More in the paper & On-going Work • A general taxonomy of user interactions with computational methods • Keyword-based vs. document-based • Template-based vs. from-scratch-based • Algorithmic details about supported user interactions • Implementation details • More usage scenarios On-going Work • Scaling up the system with parallel distributed NMF

  25. Jaegul Choojaegul.choo@cc.gatech.eduhttp://www.cc.gatech.edu/~joyfull/ Thank you!http://tinyurl.com/UTOPIAN2013 Topic merging Keyword-induced topic creation For more details, please find me at ‘Meet the Candidate’ A601+ A602, 6PM today Doc-induced topic creation Topic splitting

More Related