Modeling Political Blog Posts with Response

Modeling Political Blog Posts with Response Tae Yano Carnegie Mellon University taey@cs.cmu.edu IBM SMiLe Open House Yorktown Heights, NY October 8, 2009

Talk is about How we are designing topic models for online political discussion

Political blogs Why (should we) study political blogs? • An influential social phenomenon. • An important venue for civil discourse. • Blog text is relatively understudied. • Interest in text analysis from social/political science researchers • Monroe et al., 2009; Hopkins and King, 2009; many others

Political blogs Why (should we) study political blogs? A different / interesting type of text we don’t usually deal with in NLP • Spontaneous text: Often ungrammatical, copious misspelling and colloquialism • Elusive information needs (“popularity”, “influence”, “trustworthy”). • Difficult and costly in classical supervised approach. • The text is a composed of the mixture of diverse linguistic styles.

Political blogs - Illustration

Political blogs - Illustration Posts are often coupled with commentsections Comment style is casual, creative, less carefully edited

Political blogs - Illustration Comments often meander across several themes “If the “President gets health care” “Taxes and Fee” On topic “The rock that keeps things off the table” Tangent Ranting?

Political blogs - Illustration Posts tend to discuss multiple themes House Republicans? Government neglect? Energy policy? Oil companies?

Political blogs - Illustration “I am in total agreement … In contrast … My understanding is….” Comments can be constructive and formal …or subjective and conversational “ Iowa-Shiowa”

Political blogs - Illustration Comments can be very long “Absurd” …or quite terse

Political blogs - Illustration How should we approachthis sort of data? Our approach is to treat it as an instance of Topic Modeling Latent Dirichlet Allocation or LDA (Blei, Ng, and Jordan, 2003)

Topic modeling What does this approach buy us? • Naturally express the idea that a text is comprised of several distinctive components: • A post and its reactions (comments) • A mixture of different themes within one post • Diverse personalstyles and petpeeves • A convenient choice for corpora with uncertainty • We can encode hypotheses, and have the model learn from data. • Modularity makes it easy to change the model

CommentLDA Modeling political blogs Our proposed political blog model: z, z` = topic w = word (in post) w`= word (in comments) u = user D = # of documents; N = # of words in post; M = # of words in comments

ß d a zi wi Nd D CommentLDA Modeling political blogs Our proposed political blog model: LHS is vanilla LDA D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs RHS to capture the generation of reaction separately from the post body Our proposed political blog model: Two chambers share the same topic-mixture Two separate sets of word distributions D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs Our proposed political blog model: User IDs of the commenters as a part of comment text generate the words in the comment section D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs Three variations on user ID generation: “Verbosity” (original model) M = # of words in all comments L = 1 “Comment frequency” M = # of comments to the post L = # of words in the comment “Response” M = # of participants to the post L = # of words by one participant L

:^) Liberty Democracy Fraternity Whatever Think of this as encoding a hypothesis about which type of user ought to weigh more! Equality Commentfreq ….Liberty… …Democracy… ….Fraternity… …Equality… …Whatever… Verbosity Response

CommentLDA Modeling political blogs Another model we tried: Took out the words from the comment section! This is a model agnostic to the words in the comment section! D = # of documents; N = # of words in post; M = # of words in comments

Modeling political blogs Another model we tried: LinkLDA (Erosheva et al, 2004) The model is structurally (but not semantically) equivalent to the Link LDA from (Erosheva et al., 2004; Nallapati and Cohen, 2008) D = # of documents; N = # of words in post; M = # of words in comments

Topic discovery What topics did the models discover? What differences are there between the post and comments? • Data sets: 5 major US blogs collected over a year - this data is available on our website (http://www.ark.cs.cmu.edu/blog-data). • Each site has 1000 to 2000 training posts; details about the data sets in Yano, Cohen, and Smith, 2009. • Inference is implemented with Gibbs sampling. • Following are some topics from Matthew Yglesias site.

Topic discovery

Comment prediction A guessing game: Can we predict which users will react given an unseen post? • Infer the topic mixture for each test post using the fitted model • Rank users according to p(user | post, model) for each user • Envisioned useful for personalized blog filtering or recommendation system

(MY) 27.54 20.54 14.83 12.56 CommentLDA (R,C) (RS) 25.19 16.92 12.14 9.82 LinkLDA (R) Comment prediction CommentLDA performs consistently better for MY site, LinkLDA is a much better option for RS. Does our model lack the expressive power to reflect site differences? Our models perform at least as well as a word-based NB baseline Precision at top 5, 10, 20, 30 user prediction From left to right: Link LDA(-v, -r,-c) Comment LDA (-v, -r, -c)

Comment prediction Variation in user counting does make a difference. Giving more weight to verbose users does not help for this task. CommentLDA: (MY) LinkLDA: (RS) Verbosity vs. Response From left to right: cut off n = 5,10, 20, and 30 top ranked users

Future work What forecasting task can our model do? Using Comment LDA to predict the topics of the post given comments: Useful for automatic text categorization or text search when post has no searchable text.

Future work Can we automatically adjust how much the words influence the topics given the site? • Better comment prediction? • Inferential questions involving multiple sites S BG

Future work Can we guess which posts will collect more responses (number of comments, volume of comments)? • A variant of SLDA (Blei and McAuliffe, 2007) with comments • Link LDA-type model also possible. M

Summary Political blogs are an exciting new domain for language and learning research. Topic modeling is a viable framework for analyzing the text of online political discussions. It is convenient and competitive in tasks that have potential uses in real applications.

End of presentation

References • Our published version of this work includes a detailed profile of our data set, as well as more experiments. http://www.aclweb.org/anthology/N/N09/N09-1054.pdf • Please refer back to the original LDA paper for the complete picture. http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf • The Gibbs sampling for LDA is detailed in Griffiths & Steyvers, 2004. http://www.pnas.org/cgi/reprint/0307752101v1.pdf • Hierarchical Bayesian Compiler (HBC) used for Gibbs sampling: http://www.cs.utah.edu/~hal/HBC

Comment prediction (MY) 20.54 % Modest performance (16% to 32% precision), but compares favorably to the Naïve Bayes baseline Comment LDA (R) (RS) (CB) 16.92 % 32.06 % Link LDA (R) Link LDA (C) Precision at top 10 user prediction From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB)

Modeling Political Blog Posts with Response

Modeling Political Blog Posts with Response

Presentation Transcript

The Political Response

Crown Capital Eco Management Fc2 Blog Posts

Experiments with Mood Classification in Blog Posts

Extending the My Site activity feed with external blog posts

Extending the My Site activity feed with external blog posts

Market Response Modeling

Political Response

65 Interesting Ideas* for Class Blog Posts

How To Write Blog Posts That Sell!

Best inbound marketing blog posts 2016

6 Tips for SEO-Friendly Blog Posts

Steps for displaying blog posts in Amazing.website

Best Caving Blog Posts & Videos - Spelunking

Backlinks for Blog Posts Strategy

Blog Posts Guide - What To Write In A Blog Post

Finding Guest Blog Posts and Traffic

4&20 Blog Posts: a Marketing eBook

Surefire Ways To Optimize Your Blog Posts

Modeling Political Blog Posts with Response

Scrape Blog Posts Daily from Tumblr

5 Ways To End Your Blog Posts

OUR TOP 19 BLOG POSTS IN 2019