1 / 62

Modeling Information Seeking Behavior in Social Media

Modeling Information Seeking Behavior in Social Media. Eugene Agichtein Emory University. Yandong Liu (2 n d year Phd ). Intelligent Information Access Lab (IRLab). Text and data mining Modeling information seeking behavior Web search and social media search

oceana
Télécharger la présentation

Modeling Information Seeking Behavior in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Information Seeking Behavior inSocial Media Eugene Agichtein Emory University

  2. Yandong Liu (2nd year Phd) Intelligent Information Access Lab (IRLab) • Text and data mining • Modeling information seeking behavior • Web search and social media search • Tools for medical informatics and public health External collaborators: - Beth Buffalo (Neurology) - Charlie Clarke (Waterloo) - Ernie Garcia (Radiology) - Phil Wolff (Psychology) - HongyuanZha (GaTech) Ablimit Aji (1st year PhD) Qi Guo (2nd year Phd) Supported by:

  3. Online Behavior and Interactions • Information sharing: blogs, forums, discussions • Search logs:queries, clicks • Client-side behavior: Gaze tracking, mouse movement, scrolling

  4. Research Overview Discover Models of Behavior(machine learning/data mining) Intelligent search Social media Cognitive Diagnostics Health Informatics 4

  5. Applications that Affect Millions • Search: ranking, evaluation, advertising, search interfaces, medical search (clinicians, patients) • Collaboratively generated content: searcher intent, success, expertise, content quality • Health informatics: self reporting of drug side effects, co-morbidity, outreach/education • Automatic cognitive diagnostics: stress, frustration, Alzheimer’s, Parkinson's, ADHD, ….

  6. 6

  7. (Text) Social Media Today Published: 4Gb/day Social Media: 10Gb/Day Technorati+Blogpulse120M blogs2M posts/day Twitter: since 11/07:2M users3M msgs/day Facebook/Myspace: 200-300M usersAvg 19 m/day Yahoo Answers: 90M users, 20M questions, 400M answers Yes, we could read your blog. Or, you could tell us about your day [Data from Andrew Tomkins, SSM2008 Keynote]

  8. Total time: 7-10 minutes, active “work” 9

  9. Someone must know this…

  10. +1 minute

  11. +7 hours: perfect answer

  12. Update (2/15/2009)

  13. http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AAh1HdO 14

  14. Finding Information Online (Revisited) Next generation of search: Algorithmically-mediated information exchange CQA (collaborative question answering): • Realistic information exchange • Searching archives • Train NLP, IR, QA systems • Study of social behavior, norms Content quality, asker satisfaction Current andfuture work

  15. (Some) Related Work • Adamic et al., WWW 2007, WWW 2008: • Expertise sharing, network structure • Elsas et al., SIGIR 2008: • Blog search • Glance et al.: • Blog Pulse, popularity, information sharing • Harper et al., CHI 2008, 2009: • Answer quality across multiple CQA sites • Kraut et al.: • community participation • Kumar et al., WWW 2004, KDD 2008, …: • Information diffusion in blogspace, network evolution SIGIR 2009 Workshop on Searching Social Media http://ir.mathcs.emory.edu/SSM2009/

  16. Finding High Quality Content in SM E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding High Quality Content in Social Media, in WSDM 2008 As judged by professional editors Well-written Interesting Relevant (answer) Factually correct Popular? Provocative? Useful?

  17. Social Media Content Quality E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, WSDM 2008 quality

  18. 20

  19. How do Question and Answer Quality relate? 21

  20. 22

  21. 23

  22. 24

  23. 25

  24. Community

  25. User 3 User 1 User 4 User 5 User 6 User 2 Link Analysis for Authority Estimation Answer 1 User 3 Question 1 User 1 User 4 Answer 2 Question 2 Answer 3 User 5 User 2 User 6 Answer 4 Question 3 Answer 5 Answer 6 Hub (asker) Authority (answerer)

  26. Qualitative Observations HITS effective   HITS ineffective

  27. Random forest classifier 29

  28. Result 1: Identifying High Quality Questions

  29. Top Features for Question Classification Asker popularity (“stars”) Punctuation density Question category Page views KL Divergence from reference LM

  30. Identifying High Quality Answers

  31. Top Features for Answer Classification Answer length Community ratings Answerer reputation Word overlap Kincaid readability score

  32. Finding Information Online (Revisited) • Next generation of search: • human-machine-human • CQA: a case study in complex IR • Content quality • Asker satisfaction • Understanding the interactions

  33. Dimensions of “Quality” As judged by the asker (or community) Well-written Interesting Relevant (answer) Factually correct Popular? Timely? Provocative? Useful? 35

  34. Are Editor Labels “Meaningful” for CGC? • Information seeking process: want to find useful information about topic with incomplete knowledge • N. Belkin: “Anomalous states of knowledge” • Want to model directly if user found satisfactory information • Specific (amenable) case: CQA

  35. Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Effective for subjective information needs Great forum for socialization/chat Can be invaluable for hard-to-find information not available on the web

  36. Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer 1. FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Soccer 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours)

  37. Predicting Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. “Satisfied” : The asker has closed the question AND Selected the best answer AND Rated best answer >= 3 “stars” (# not important) Else, “Unsatisfied Y. Liu, J. Bian, and E. Agichtein, in SIGIR 2008 Jiang Bian Yandong Liu

  38. ASP: Asker Satisfaction Prediction asker is satisfied asker is not satisfied Answerer History Answer Text Category Asker History Question Wikipedia Classifier News 41

  39. Experimental Setup: Data Crawled from Yahoo! Answers in early 2008 “Anonymized” dataset available at: http://ir.mathcs.emory.edu/shared/ 1/2009: Yahoo! Webscope : “Comprehensive” Answers dataset: ~5M questions & answers. 42

  40. Satisfaction by Topic

  41. Satisfaction Prediction: Human Judges Truth: asker’s rating A random sample of 130 questions Researchers Agreement: 0.82 F1: 0.45  2P*R/(P+R) Amazon Mechanical Turk Five workers per question. Agreement: 0.9 F1: 0.61 Best when at least 4 out of 5 raters agree 44

  42. Performance: ASP vs. Humans (F1, Satisfied) Human F1 is lower than the random baseline! ASP is significantly more effective than humans

  43. Top Features by Information Gain 0.14 Q: Askers’ previous rating 0.14 Q: Average past rating by asker 0.10 UH: Member since (interval) 0.05 UH: Average # answers for by past Q 0.05 UH: Previous Q resolved for the asker 0.04 CA: Average asker rating for category 0.04 UH: Total number of answers received …

  44. “Offline” vs. “Online” Prediction Offline prediction (AFTER answers arrive) All features( question, answer, asker & category) F1: 0.77 Online prediction (BEFORE question posted) NO answer features Only asker history and question features (stars, #comments, sum of votes…) F1: 0.74 47

  45. Personalized Prediction of Satisfaction Same information != same usefulness for different searchers! Personalization vs. “Groupization”? Y. Liu and E. Agichtein, You've Got Answers: Personalized Models for Predicting Success in Community Question Answering, ACL 2008

  46. Example Personalized Models

  47. Outline • Next generation of search: • Algorithmically mediated information exchange • CQA: a case study in complex IR • Content quality • Asker satisfaction

More Related