1 / 52

Topical search in the Twitter OSN

Topical search in the Twitter OSN. Saptarshi Ghosh. Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS). Topical search in Twitter. Twitter has emerged as an important source of information & real-time news

marlow
Télécharger la présentation

Topical search in the Twitter OSN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topical search in the Twitter OSN Saptarshi Ghosh Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi (MPI-SWS)

  2. Topical search in Twitter • Twitter has emerged as an important source of information & real-time news • Search for breaking news and trending topics • Topical search • Searching for topical experts • Searching for information on specific topics • Primary requirement: Identify topical expertise of users

  3. Profile of a Twitter user

  4. Example tweets

  5. Prior approaches to find topic experts Research studies Pal et. al. (WSDM 2011) uses 15 features from tweets, network, to identify topical experts Weng et. al. (WSDM 2010) uses ML approach Application systems Twitter Who To Follow (WTF), Wefollow, … Methodology not fully public, but reported to utilize several features

  6. Prior approaches use features extracted from User profiles Screen-name, bio, … Tweets posted by a user Hashtags, others retweeting a given user, … Social graph of a user Number of followers, PageRank, …

  7. Problems with prior approaches User profiles – screen-name, bio, … Bio often does not give meaningful information Tweets posted by a user Tweets mostly contain day-to-day conversation Social graph of a user – number of followers, PageRank Helps to identify authoritative users, but … Does not provide topical information

  8. We propose … • Use a completely different feature to infer topics of expertise for an individual Twitter user • Utilize social annotations • How does the Twitter crowd describe a user? • Social annotations obtained through Twitter Lists • Approach essentially relies on crowdsourcing

  9. Twitter Lists • Primarily an organizational feature • Used to organize the people one is following • Create a named list, add an optional List description • Add related users to the List • Tweets posted by these users will be grouped together as a separate stream

  10. How Lists work ?

  11. Using Lists to infer topics for users • If U is an expert / authority in a certain topic • U likely to be included in several Lists • List names / descriptions provide valuable semantic cues to the topics of expertise of U

  12. Inferring topical attributes of users

  13. Dataset • Collected Lists of 55 million Twitter users who joined before or in 2009 • 88 million Lists collected in total • All studies consider 1.3 million users who are included in 10 or more Lists • Most List names / descriptions in English, but significant fraction also in French, Portuguese, …

  14. Mining Lists to infer expertise Collect Lists containing a given user U List names / descriptions collected into a ‘topic document’ for the given user Identify U’s topics from the document Ignore domain-specific stopwords Identify nouns and adjectives Unify similar words based on edit-distance, e.g., journalists and jornalistas, politicians and politicos (not unified by stemming)

  15. Mining Lists to infer expertise • Unigrams and bigrams considered as topics • Extracted from topic document of U: • Topics for user U • Frequencies of the topics in the document

  16. Topics inferred from Lists politics, senator, congress, government, republicans, Iowa, gop, conservative politics, senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture linux, tech, open, software, libre, gnu, computer, developer, ubuntu, unix

  17. Lists vs. other features Profile bio love, daily, people, time, GUI, movie, video, life, happy, game, cool Most common words from tweets Most common words from Lists celeb, actor, famous, movie, stars, comedy, music, Hollywood, pop culture

  18. Lists vs. other features Profile bio Fallon, happy, love, fun, video, song, game, hope, #fjoln, #fallonmono Most common words from tweets Most common words from Lists celeb, funny, humor, music, movies, laugh, comics, television, entertainers

  19. Evaluation of inferred topics – 1 • Evaluated through user-survey • Evaluator shown top 30 topics for a chosen user • Are the inferred attributes (i) accurate, (ii) informative? • Binary response for both queries • More than 93% evaluators judged the topics to be both accurate and informative • The few negative judgments were a result of subjectivity

  20. Evaluation of inferred topics – 2 Comparison with topics identified by Twitter WTF Obtained top 20 WTF results for about 200 queries  3495 distinct users Topics inferred by us from Lists include query-topic for 2916 users (83.4%) For the rest Case 1 – inferred topics include semantically very similar words, but not exact query-word (18%) Case 2 – wrong results by WTF, unrelated to query (58%)

  21. Comparison with Twitter WTF Restaurant dineLA for query “dining” Inferred topics – food, restaurant, recipes, los angeles Space explorer HubbleHugger77 for query “hubble” Inferred topics – science, tech, space, cosmology, nasa Comedian jimmyfallon for query “astrophysicist” Inferred topics – celebs, comedy, humor, actor Web developer ScreenOrigami for query “origami” Inferred topics – webdesign, html, designers Case 1 Case 2

  22. Who-is-who service • Developed a Who-is-Who service for Twitter • Shows word-cloud for major topics for a user • http://twitter-app.mpi-sws.org/who-is-who/ Inferring Who-is-who in the Twitter Social Network, WOSN 2012 (Highest rated paper in workshop)

  23. Identifying topical experts

  24. Topical experts in Twitter • 400 million tweets posted daily • Quality of tweets posted by different users vary widely • News, pointless babble, conversational tweets, spam, … • Challenge: to find topical experts • Sources of authoritative information on specific topics

  25. Basic methodology • Given a query (topic) • Identify experts on the topic using Lists • Discussed earlier • Rank identified experts w.r.t. expertise on the given topic • Need a suitable ranking algorithm • Commonly used ranking metrics such as number of followers, PageRank does not consider topic

  26. Ranking experts • Two components of ranking user U w.r.t. query Q: relevance of U to Q, popularity of U • Relevance of user to query • Cover density ranking between topic document TU of user U and Q • Cover Density ranking preferred for short queries • Popularity of user: Number of Lists including the user Topic relevance( TU, Q ) × log( #Lists including U )

  27. Cognos • Search system for topical experts in Twitter • Publicly deployed at http://twitter-app.mpi-sws.org/whom-to-follow/ Cognos: Crowdsourcing Search for Topic Experts in Microblogs, ACM International SIGIR Conference 2012

  28. Cognos results for “politics”

  29. Cognos results for “stem cell”

  30. Cognos results for “earthquake”

  31. Evaluation of Cognos • System evaluated ‘in-the-wild’ • People were asked to try the system and give feedback • Evaluators were students & researchers from the home institutes of researchers • Advantage – lot of varied queries tried • Disadvantage – subjectivity in relevance judgement

  32. User-evaluation of Cognos

  33. Sample queries for evaluation

  34. Evaluation results • Overall 2136 relevance judgments over 55 queries • 1680 said relevant (78.7%) • Large amount of subjectivity in evaluations • Same result for same query received both relevant and non-relevant judgments • E.g., for query “cloud computing”, Werner Vogels got 4 relevant judgments, 6 non-relevant judgments

  35. Cognos vs Twitter Who-to-follow Evaluator shown top 10 results by both systems Result-sets anonymized Evaluator judges which is better / both good / both bad Queries chosen by evaluators themselves 27 distinct queries were asked at least twice In total, asked 93 times Judgment by majority voting

  36. Cognos vs Twitter WTF Cognos judged better on 12 queries Computer science, Linux, mac, Apple, ipad, India, internet, windows phone, photography, political journalist Twitter WTF judged better on 11 queries Music, Sachin Tendulkar, Anjelina Jolie, Harry Potter, metallica, cloud computing, IIT Kharagpur Mostly names of individuals or organizations Tie on 4 queries Microsoft, Dell, Kolkata, Sanskrit as an official language

  37. Topical content search

  38. Challenges in topical content search • Services today are limited to keyword search • Search for ‘politics’  get only tweets which contain the word ‘politics’ • Knowing which keywords to search for, is itself an issue • Individual tweets are too small to deduce topics • Scalability: 400M tweets posted per day • Tweets may contain spam / rumors / phishing URLs

  39. Our approach • Look at tweets posted by a selected set of topical experts • Inferring topic of tweets from tweeters’ expertise • Large fraction of tweets posted by experts are only about day-to-day conversation • Solution: If multiple experts on a topic tweet about something, it is most likely related to the topic

  40. Sampling Tweets from Experts • We capture all tweets from 585K topical experts • Identified through Lists • Expertise in a wide variety of topics • The experts generate 1.46 million tweets per day • 0.268% of all tweets on twitter  scalable • Trustworthiness • Experts not likely to post spam / phishing URLs • Less chance of rumors in what is posted by several experts

  41. Methodology at a Glance • Gather tweets from experts on given topic • Group tweets on the same news-story • We use a group of hashtags to represent a news-story • Multi-level clustering (cluster: news-story) • Cluster tweets based on the hashtags they contain • Cluster hashtags based on co-occurrence • Rank new-stories by popularity • Number of distinct experts tweeting on the story • Number of tweets on the story

  42. Results for the last week on Politics (a popular topic)

  43. Hashtags which co-occur frequently grouped together Related tweets grouped together by common hashtags. The most popular tweet in the story shown

  44. Our system specially excels for niche topics.

  45. Evaluation – Relevance • Evaluated using human feedback • Used Amazon Mechanical Turk for user evaluation • Evaluated top 10 clusters for 20 topics • Users have to judge if the tweet shown was relevant to the given topic • Options are Relevant / Not Relevant / Can’t Say

  46. Evaluating Tweet Relevance • We obtained 3150 judgments • 80% of tweets marked relevant by majority judgment • Non-relevant results primarily due to • Global events that were discussed by experts across all topics, e.g., Hurricane Sandy in the USA • Sometimes, topic is too specific and several experts tweet on a broader topic (e.g., baseball and ESPN Sports Update)

  47. Effect of global events • Experts on all topics tweeting on #sandy • Most of these got negative judgments

  48. Diversity of topics in Twitter

  49. Topics in Twitter • Discovering thousands of experts on diverse topics  characterizing the Twitter platform as a whole • On what topics is expert content available in Twitter? • Popular view – few topics such as politics, sports, music, celebs, … • We find – lots of niche topics along with the popular ones

More Related