1 / 26

Inferring User Political Preferences from Streaming Communications

Inferring User Political Preferences from Streaming Communications. Svitlana Volkova 1 , Glen Coppersmith 2 and Benjamin Van Durme 1,2. 1 Center for Language and Speech Processing 2 Human Language Technology Center of Excellence. ACL 2014, Baltimore. Motivation.

truman
Télécharger la présentation

Inferring User Political Preferences from Streaming Communications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring User Political Preferences from Streaming Communications Svitlana Volkova1, Glen Coppersmith2 and Benjamin Van Durme1,2 1Center for Language and Speech Processing 2Human Language Technology Center of Excellence ACL 2014, Baltimore

  2. Motivation • Personalized, diverse and timely data • Can reveal user interests, preferences and opinions DemographicsPro – http://www.demographicspro.com/ WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/

  3. Applications • Large-scale passive polling and real-time live polling • Online advertising • Healthcare analytics • Personalized recommendation systems and search

  4. User Attribute Prediction Communications … … … … . . . …

  5. Existing Approaches ~1K Tweets* ….… ….… Tweets as a document ….… ….… Does an average Twitter user produce thousands of tweets? ….… ….… ….… ….… *Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013

  6. How Active are Twitter Users? http://www.digitalbuzzblog.com/visualizing-twitter-statistics-x100/

  7. Real-World Predictions 10% Active users 1,000+ tweets 50% Average Twitter users Median = 10 tweets per day 20% Private users: no content 20% Not active users: no or limited content

  8. Our Approach • Take advantage of user local neighborhoods • Incremental dynamic real-time predictions Real world batch predictions Streaming predictions

  9. Our Approach • Take advantage of user local neighborhoods • Incremental dynamic real-time predictions Real world batch predictions

  10. Attributed Social Network User Local Neighborhoods a.k.a. Social Circles

  11. Twitter Network Data http://www.cs.jhu.edu/~svitlana/ Code, data and trained models for gender, age, political preference prediction

  12. Twitter Social Graph • Candidate-Centric 1,031 users of interest • Geo-Centric 270 users • Politically Active* 371 users 10 - 20 neighbors of each type per user ~50K nodes, ~60K edges What types of neighbors lead to the best attribute prediction for a given user? *Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013 http://www.cs.jhu.edu/~svitlana/ Code, data and trained models for gender, age, political preference prediction

  13. Experiments • Log-linear binary unigram models: • Users vs. (II) Neighbors and (III) Both • Evaluate the relative utility of different neighborhood types: • varying neighborhood size n=[1, 2, 5, 10] and content amount t=[5, 10, 15, 25, 50, 100, 200] • 10-fold cross validation with100random restarts for every n and t parameter combination

  14. Neighborhood Comparison 1 Neighbor 10 Neighbors Accuracy Tweets per Neighbor Tweets per Neighbor

  15. Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

  16. Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

  17. Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

  18. Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

  19. Summary: Batch Real-World Predictions with Limited User Data More data is betterHow to get it? • More neighbors per user > additional content from the existing neighbors What kind of data? • Follower, friend, @mention, retweet • Users recently joined Twitter • No or limited access to user tweets Real-world predictions no or very limited content!

  20. Our Approach • Take advantage of user local neighborhoods • Incremental dynamic real-time predictions Streaming predictions

  21. Iterative Bayesian Predictions ? Time …

  22. Cand-Centric Graph: Belief Updates ? ? Time Time … …

  23. Cand-Centric Graph: Prediction Time User Stream User-Neighbor 95% 75% Cand 100 users 75% confidence

  24. Batch vs. Online Performance

  25. Summary • Neighborhood content is useful* • Neighborhoods constructed from friends, usermentions and retweetsare most effective • Signal is distributed in the neighborhood • Streaming models > batch models *Pennacchiottiand Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011;Zamalet al., 2012

  26. Thank you! Labeled Twitter network data for gender, age, political preferenceprediction: http://www.cs.jhu.edu/~svitlana/ Codeand pre-trained models available upon request: svitlana@jhu.edu

More Related