1 / 42

Generative Topic Models for Community Analysis

Generative Topic Models for Community Analysis. Pilfered from: Ramesh Nallapati http://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt. Objectives. Cultural literacy for ML: Q: What are “topic models”? A 1 : popular indoor sport for machine learning researchers

ringo
Télécharger la présentation

Generative Topic Models for Community Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generative Topic Models for Community Analysis Pilfered from: Ramesh Nallapati http://www.cs.cmu.edu/~wcohen/10-802/lda-sep-18.ppt

  2. Objectives • Cultural literacy for ML: • Q: What are “topic models”? • A1: popular indoor sport for machine learning researchers • A2: a particular way of applying unsupervised learning of Bayes nets to text • Quick historical survey of some sample papers in the area

  3. Outline • Part I: Introduction to Topic Models • Naive Bayes model • Mixture Models • Expectation Maximization • PLSA • LDA • Variational EM • Gibbs Sampling • Part II: Topic Models for Community Analysis • Citation modeling with PLSA • Citation Modeling with LDA • Author Topic Model • Author Topic Recipient Model • Modeling influence of Citations • Mixed membership Stochastic Block Model

  4. Introduction to Topic Models • Multinomial Naïve Bayes  • For each document d = 1,, M • Generate Cd ~ Mult( ¢ | ) • For each position n = 1,, Nd • Generate wn ~ Mult(¢|,Cd) C ….. WN W1 W2 W3 M b

  5. Introduction to Topic Models • Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

  6. Introduction to Topic Models • Mixture model: unsupervised naïve Bayes model • Joint probability of words and classes: • But classes are not visible:  C Z W N M b

  7. Introduction to Topic Models

  8. Introduction to Topic Models • Probabilistic Latent Semantic Analysis Model d d • Select document d ~ Mult() • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn)  Topic distribution z w N M 

  9. Introduction to Topic Models • Probabilistic Latent Semantic Analysis Model • Learning using EM • Not a complete generative model • Has a distribution  over the training set of documents: no new document can be generated! • Nevertheless, more realistic than mixture model • Documents can discuss multiple topics!

  10. Introduction to Topic Models • PLSA topics (TDT-1 corpus)

  11. Introduction to Topic Models

  12. Introduction to Topic Models • Latent Dirichlet Allocation  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) a z w N M 

  13. Introduction to Topic Models • Latent Dirichlet Allocation • Overcomes the issues with PLSA • Can generate any random document • Parameter learning: • Variational EM • Numerical approximation using lower-bounds • Results in biased solutions • Convergence has numerical guarantees • Gibbs Sampling • Stochastic simulation • unbiased solutions • Stochastic convergence

  14. Introduction to Topic Models • Variational EM for LDA • Approximate the posterior by a simpler distribution • A convex function in each parameter!

  15. Introduction to Topic Models • Gibbs sampling • Applicable when joint distribution is hard to evaluate but conditional distribution is known • Sequence of samples comprises a Markov Chain • Stationary distribution of the chain is the joint distribution

  16. Introduction to Topic Models • LDA topics

  17. Introduction to Topic Models • LDA’s view of a document

  18. Introduction to Topic Models • Perplexity comparison of various models Unigram Mixture model PLSA Lower is better LDA

  19. Outline • Part I: Introduction to Topic Models • Naive Bayes model • Mixture Models • Expectation Maximization • PLSA • LDA • Variational EM • Gibbs Sampling • Part II: Topic Models for Community Analysis • Citation modeling with PLSA • Citation Modeling with LDA • Author Topic Model • Author Topic Recipient Model • Modeling influence of Citations • Mixed membership Stochastic Block Model

  20. Hyperlink modeling using PLSA

  21. Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]  • Select document d ~ Mult() • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) • For each citation j = 1,, Ld • generate zj ~ Mult( ¢ | d) • generate cj ~ Mult( ¢ | zj) d d z z w c N L M  g

  22. Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001]  PLSA likelihood: d d z z New likelihood: w c N L M  g Learning using EM

  23. Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001] Heuristic:  (1-) 0 ·· 1 determines the relative importance of content and hyperlinks

  24. Hyperlink modeling using PLSA[Cohn and Hoffman, NIPS, 2001] • Classification performance content Hyperlink Hyperlink content

  25. Hyperlink modeling using LDA

  26. Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004] a  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) • For each citation j = 1,, Ld • generate zj ~ Mult( . | d) • generate cj ~ Mult( . | zj) z z w c N L M  g Learning using variational EM

  27. Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]

  28. Author-Topic Model for Scientific Literature

  29. Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] a P • For each author a = 1,,A • Generate a ~ Dir(¢ | ) • For each topic k = 1,,K • Generate fk ~ Dir( ¢ | ) • For each document d = 1,,M • For each position n = 1,, Nd • Generate author x ~ Unif(¢ | ad) • generate zn ~ Mult( ¢ | a) • generate wn ~ Mult( ¢ | fzn) a x z  A w N M f b K

  30. Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] a Learning: Gibbs sampling P  x z  A w N M f b K

  31. Author-Topic Model for Scientific Literature[Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] • Topic-Author visualization

  32. Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]

  33. Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] Gibbs sampling

  34. Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] • Datasets • Enron email data • 23,488 messages between 147 users • McCallum’s personal email • 23,488(?) messages with 128 authors

  35. Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] • Topic Visualization: Enron set

  36. Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] • Topic Visualization: McCallum’s data

  37. Modeling Citation Influences

  38. Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007] • Citation influence model

  39. Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007] • Citation influence graph for LDA paper

  40. Modeling Citation Influences[Dietz, Bickel, Scheffer, ICML 2007] • Words in LDA paper assigned to citations

  41. Link-PLSA-LDA: Topic Influence in Blogs (ICWSM 2008) Ramesh Nallapati, Amr Ahmed Eric Xing

More Related