1 / 15

Local/Global Term Analysis for Discovering Community Differences in Social Networks

Local/Global Term Analysis for Discovering Community Differences in Social Networks. David Fuhry , Yiye Ruan, and Srinivasan Parthasarathy. Data Mining Research Laboratory Dept. of Computer Science and Engineering The Ohio State University. Communities in Social Networks. Observations:

sveta
Télécharger la présentation

Local/Global Term Analysis for Discovering Community Differences in Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local/Global Term Analysis for Discovering Community Differences in Social Networks David Fuhry, Yiye Ruan, and Srinivasan Parthasarathy Data Mining Research Laboratory Dept. of Computer Science and Engineering The Ohio State University

  2. Communities in Social Networks • Observations: • Social networks consist of many interacting communities of users. • Each community can be characterized by the content which its members generate. • Motivating questions: • Given a community, how can we determine what its members are talking about, relative to the entire social network? • Given two communities, how can we determine the difference between them?

  3. Methodology • A community’s users mention relevant terms frequently. • Many works look at #hashtags or most frequent terms. • But not all frequent terms are relevant. • Desiderata: • Consider all content terms • Interpretable • scalable to million-user social networks

  4. Four-step Process • Four-step process for determining community differences: • Community Discovery • Term Extraction & Aggregation • Visualization • Handling Time Varying Data Network Content

  5. 1. Community Discovery (I) • Keyword search based identification of candidate users • Extract underlying network of users • Local community identification • Graph clustering (e.g. METIS [KARYPIS’99], Graclus[DHILLON’07], MLR-MCL [SATULURI’09], Localized Clustering (L-Spar) [SATULURI’11]) • Modularity [NEWMAN’04] • Content-Sensitive Viewpoint Neighborhoods[Asur’09]

  6. 1. Community Discovery (II) • Start with the network of all users • Extract candidate communities • Using any community discovery algorithm • Filter candidate communities by keyword strength

  7. 2. Term Extraction & Aggregation • Extract terms from each message and weight them • Term Frequency • TF/IDF • Domain-dependent semantic importance • Merge terms • Combine synonyms • Handling hypernyms • Aggregate them by user

  8. 3. Visualization • Plot terms by frequency across two axes. • Global (all users) on Y-axis • Local (community users) on X-axis. • Terms on the regression line are equifrequent in both groups • Terms off the regression line are relatively more frequent in one group • Support for multiple scales of local community identification

  9. 4. Handling Time Varying Data • Time range divided into batches • Perform steps 1 to 3 for each batch • Visualize results

  10. Experimental Results Using a dataset of 1M tweets we look at groups discussing Canon, Nikon, and Olympus cameras: Between Nikon and Olympus communities, Olympus community talks more about blogs.

  11. Experimental Results Between camera and global communities, camera community talks less about health, teeth, and success.

  12. Experimental Results Using a dataset of 2M tweets about the “Occupy” movement, we compare “Occupy Oakland” to the entire “Occupy” movement: Occupy Oakland movement talks less about NYPD, p2 (group of progressives using social media), and tcot (“Top Conservatives On Twitter”).

  13. Filter and Zoom

  14. Conclusions • Four-part visual analytic framework for discovering differences between communities in social networks. • Simple • Scalable • Qualitative and quantitative results. • Future • Temporal • More quantitative measures • Automatically determine best scale

  15. Thank You!

More Related