1 / 7

Who Needs Polls? Gauging Public Opinion from Twitter Data

Who Needs Polls? Gauging Public Opinion from Twitter Data. David Cummings Haruki Oh Ningxuan (Jason) Wang. From Tweets to Poll Numbers. Motivation: People spend millions of dollars on polling every year: politics, economy, entertainment Millions of posts on Twitter every day

bbrody
Télécharger la présentation

Who Needs Polls? Gauging Public Opinion from Twitter Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Who Needs Polls?Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang

  2. From Tweets to Poll Numbers • Motivation: People spend millions of dollars on polling every year: politics, economy, entertainment • Millions of posts on Twitter every day • Can we model public opinion using tweets? • Data: 476 million tweets from June to December 2009, courtesy of Jure Lescovec • Public polls from The Gallup Organization (presidential approval, economic confidence) and Rasmussen Reports (generic Congressional ballot) • Goal: high correlation with public opinion polls • All correlation figures for 6-day smoothing window

  3. Approach 1: Volume • The simplest metric: percentage of tweets that mention a given topic in a certain time window • Moderate negative correlation (-36.3%, -35.7%) for economy and Congressional ballot: mention things you want to complain about more often • Higher correlation (52.4%) for Obama

  4. Approach 2: Generic Sentiment • Can we distinguish between positive and negative sentiment of tweets? • University of Pennsylvania OpinionFinder subjective polarity lexicon • “conceited” strong negative -10 • “ironic” weak negative -5 • “trendy” weak positive +5 • “illuminating” strong positive +10 • Sum word scores for a tweet to classify it as positive, negative, or neutral; then subtract negative counts from positive counts and normalize over window

  5. Approach 2: Generic Sentiment • Good results on economic confidence: 60.4% correlation, 70.1% correlation on 15-day window • Poor performance on presidential approval and Congressional ballot: -24.5% and 21.5% correlation respectively • Sentiment about politics expressed differently?

  6. Approach 3: LM-based Classification • Train three language models (positive, negative, and neutral) on hand-classified data • Classify each tweet according to the language model that affords it the highest probability • Applied for the case of Obama: manually classified 3,633 tweets • “can we all talk about how awesome Obama is?” • “that Obama sticker on your car might as well say ‘Yes I’m stupid’ #tcot #iamthemob #teaparty #glennbeck” • Then we tested the language models: best performer was a linearly interpolated bigram model

  7. Approach 3: LM-based Classification • Much-improved results on presidential approval: 49.4% correlation • Throwing out retweets and duplicate tweets helps a little more: 55.9% correlation • Finally, combining both volume and LM-based sentiment gives best results: 63.3% correlation, or 69.6% correlation on a 15-day window

More Related