1 / 34

Modeling Political Blog Posts with Response

Modeling Political Blog Posts with Response. Tae Yano Carnegie Mellon University taey@cs.cmu.edu IBM SMiLe Open House Yorktown Heights, NY October 8, 2009. Talk is about. How we are designing topic models for online political discussion. Political blogs .

genero
Télécharger la présentation

Modeling Political Blog Posts with Response

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Political Blog Posts with Response Tae Yano Carnegie Mellon University taey@cs.cmu.edu IBM SMiLe Open House Yorktown Heights, NY October 8, 2009

  2. Talk is about How we are designing topic models for online political discussion

  3. Political blogs Why (should we) study political blogs? • An influential social phenomenon. • An important venue for civil discourse. • Blog text is relatively understudied. • Interest in text analysis from social/political science researchers • Monroe et al., 2009; Hopkins and King, 2009; many others

  4. Political blogs Why (should we) study political blogs? A different / interesting type of text we don’t usually deal with in NLP • Spontaneous text: Often ungrammatical, copious misspelling and colloquialism • Elusive information needs (“popularity”, “influence”, “trustworthy”). • Difficult and costly in classical supervised approach. • The text is a composed of the mixture of diverse linguistic styles.

  5. Political blogs - Illustration

  6. Political blogs - Illustration Posts are often coupled with commentsections Comment style is casual, creative, less carefully edited

  7. Political blogs - Illustration Comments often meander across several themes “If the “President gets health care” “Taxes and Fee” On topic “The rock that keeps things off the table” Tangent Ranting?

  8. Political blogs - Illustration Posts tend to discuss multiple themes House Republicans? Government neglect? Energy policy? Oil companies?

  9. Political blogs - Illustration “I am in total agreement … In contrast … My understanding is….” Comments can be constructive and formal …or subjective and conversational “ Iowa-Shiowa”

  10. Political blogs - Illustration Comments can be very long “Absurd” …or quite terse

  11. Political blogs - Illustration How should we approachthis sort of data? Our approach is to treat it as an instance of Topic Modeling Latent Dirichlet Allocation or LDA (Blei, Ng, and Jordan, 2003)

  12. Topic modeling What does this approach buy us? • Naturally express the idea that a text is comprised of several distinctive components: • A post and its reactions (comments) • A mixture of different themes within one post • Diverse personalstyles and petpeeves • A convenient choice for corpora with uncertainty • We can encode hypotheses, and have the model learn from data. • Modularity makes it easy to change the model

  13. CommentLDA Modeling political blogs Our proposed political blog model: z, z` = topic w = word (in post) w`= word (in comments) u = user D = # of documents; N = # of words in post; M = # of words in comments

  14. ß d a zi wi Nd D CommentLDA Modeling political blogs Our proposed political blog model: LHS is vanilla LDA D = # of documents; N = # of words in post; M = # of words in comments

  15. CommentLDA Modeling political blogs RHS to capture the generation of reaction separately from the post body Our proposed political blog model: Two chambers share the same topic-mixture Two separate sets of word distributions D = # of documents; N = # of words in post; M = # of words in comments

  16. CommentLDA Modeling political blogs Our proposed political blog model: User IDs of the commenters as a part of comment text generate the words in the comment section D = # of documents; N = # of words in post; M = # of words in comments

  17. CommentLDA Modeling political blogs Three variations on user ID generation: “Verbosity” (original model) M = # of words in all comments L = 1 “Comment frequency” M = # of comments to the post L = # of words in the comment “Response” M = # of participants to the post L = # of words by one participant L

  18. :^) Liberty Democracy Fraternity Whatever Think of this as encoding a hypothesis about which type of user ought to weigh more! Equality Commentfreq ….Liberty… …Democracy… ….Fraternity… …Equality… …Whatever… Verbosity Response

  19. CommentLDA Modeling political blogs Another model we tried: Took out the words from the comment section! This is a model agnostic to the words in the comment section! D = # of documents; N = # of words in post; M = # of words in comments

  20. Modeling political blogs Another model we tried: LinkLDA (Erosheva et al, 2004) The model is structurally (but not semantically) equivalent to the Link LDA from (Erosheva et al., 2004; Nallapati and Cohen, 2008) D = # of documents; N = # of words in post; M = # of words in comments

  21. Topic discovery What topics did the models discover? What differences are there between the post and comments? • Data sets: 5 major US blogs collected over a year - this data is available on our website (http://www.ark.cs.cmu.edu/blog-data). • Each site has 1000 to 2000 training posts; details about the data sets in Yano, Cohen, and Smith, 2009. • Inference is implemented with Gibbs sampling. • Following are some topics from Matthew Yglesias site.

  22. Topic discovery

  23. Topic discovery

  24. Topic discovery

  25. Comment prediction A guessing game: Can we predict which users will react given an unseen post? • Infer the topic mixture for each test post using the fitted model • Rank users according to p(user | post, model) for each user • Envisioned useful for personalized blog filtering or recommendation system

  26. (MY) 27.54 20.54 14.83 12.56 CommentLDA (R,C) (RS) 25.19 16.92 12.14 9.82 LinkLDA (R) Comment prediction CommentLDA performs consistently better for MY site, LinkLDA is a much better option for RS. Does our model lack the expressive power to reflect site differences? Our models perform at least as well as a word-based NB baseline Precision at top 5, 10, 20, 30 user prediction From left to right: Link LDA(-v, -r,-c) Comment LDA (-v, -r, -c)

  27. Comment prediction Variation in user counting does make a difference. Giving more weight to verbose users does not help for this task. CommentLDA: (MY) LinkLDA: (RS) Verbosity vs. Response From left to right: cut off n = 5,10, 20, and 30 top ranked users

  28. Future work What forecasting task can our model do? Using Comment LDA to predict the topics of the post given comments: Useful for automatic text categorization or text search when post has no searchable text.

  29. Future work Can we automatically adjust how much the words influence the topics given the site? • Better comment prediction? • Inferential questions involving multiple sites S BG

  30. Future work Can we guess which posts will collect more responses (number of comments, volume of comments)? • A variant of SLDA (Blei and McAuliffe, 2007) with comments • Link LDA-type model also possible. M

  31. Summary Political blogs are an exciting new domain for language and learning research. Topic modeling is a viable framework for analyzing the text of online political discussions. It is convenient and competitive in tasks that have potential uses in real applications.

  32. End of presentation

  33. References • Our published version of this work includes a detailed profile of our data set, as well as more experiments. http://www.aclweb.org/anthology/N/N09/N09-1054.pdf • Please refer back to the original LDA paper for the complete picture. http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf • The Gibbs sampling for LDA is detailed in Griffiths & Steyvers, 2004. http://www.pnas.org/cgi/reprint/0307752101v1.pdf • Hierarchical Bayesian Compiler (HBC) used for Gibbs sampling: http://www.cs.utah.edu/~hal/HBC

  34. Comment prediction (MY) 20.54 % Modest performance (16% to 32% precision), but compares favorably to the Naïve Bayes baseline Comment LDA (R) (RS) (CB) 16.92 % 32.06 % Link LDA (R) Link LDA (C) Precision at top 10 user prediction From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB)

More Related