1 / 24

Generating and Tracking Communities Based on Implicit Affinities

Generating and Tracking Communities Based on Implicit Affinities. Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007. Introduction. Online Communities Continually emerging – many sites are adding this aspect Like offline communities, they are complex and dynamic Examples

gratia
Télécharger la présentation

Generating and Tracking Communities Based on Implicit Affinities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

  2. Introduction • Online Communities • Continually emerging – many sites are adding this aspect • Like offline communities, they are complex and dynamic • Examples • USENET (1980), Google Groups, Wikipedia • LinkedIn, Flickr, YouTube, MySpace, Facebook, etc. • Medical Communities (e.g., DailyStrength, NAAF) • Political Communities • Blogosphere – focus of experiments

  3. Motivation Explicit Links Explicit Social Network (ESN) Links: Friends, Web Links, etc.

  4. Motivation Explicit Links Implicit Affinities cancer bald smoke ESN and Implicit Affinity Network (IAN) Applications: Medical, Blogosphere, etc.

  5. Implicit Affinity • Affinity: • The overlapping of attributes-values for any common attribute • Community: • Set of individuals characterized by attributes • Linked by affinities rather than explicit relationships

  6. IAN Community Generation • Individuals – nodes • characterized by attributes • Affinities – edges • unlike traditional social networks where links represent explicit relationships, the links in our approach are based strictly on affinities • Connections emerge naturally

  7. Affinity Scoring Affinity score for a particular attribute Affinity score for all attributes

  8. Affinity Network Building IAN

  9. Social Capital for Community Tracking Social Capital: The advantage available through connections between individuals within a particular network Bonding and Bridging Metrics

  10. Preliminary Experiments & Observations

  11. Scobleizer’s Blog List • Robert Scoble (“Scobleizer”) • Blogger and book author • Technical evangelist (formerly with Microsoft) • Data Set Details: • Scobleizer’s reading list at Bloglines.com • 570 blogs • 2380 bloggers

  12. Data Set Statistics – Blog posts per day Lack of data for all bloggers during first few days We observe fewer posts during the weekend (Friday & Saturday)

  13. Single Attribute: Companies • Motivation • Many bloggers talk about various companies and what they are doing • Methodology • Whenever a company is mentioned in a blogger’s post, it becomes a feature of the blogger • Static company list used as attributes • 1,914 company names

  14. Cyclic Feature Usage

  15. Power-law Behavior – Features Observations • Few companies • mentioned by many • Many companies • mentioned by few

  16. Blog Community Evolution Observations • Weekend bonding? • Bridging indicates • newly used features • new bloggers • Overall bonding (expected) • static set of features • no decay • blogosphere is full of buzz

  17. Blog-based IAN – Feb. 24 niche sub-communities exist

  18. Conclusions • Blog posts were cyclic within this community • Posted more during the week and less during the weekends • Interestingly, bonding occurs during the weekends • Companies were mentioned in a power-law way • Few companies are mentioned often • Most companies are mentioned rarely • Niche sub-communities • Bloggers focusing on long-tail companies were identified • Blog-based IAN • Appears to follow power-law connectivity like ESNs

  19. Future Work (In Progress) • Compare IAN and ESN of the same community • Analyze evolution (social capital vs. density) • Compare snapshots • Identify and report similarities and differences • Develop hybrid sub-community identification • Experiment on domain-specific communities • Medical – patient communities • Political – jump start grass-roots campaigns

  20. More Future Work • Refine implicit attribute extraction • Allow for dynamic feature extraction • Allow features to naturally decay with time • Use LDA to extract “concepts” • Putnam’s puzzle • Consider adapting Social Capital measures to allow for uncorrelated bonding and bridging

  21. Questions ?

  22. Affinity Score Distribution

  23. Blog-based IANs– Filtered by Threshold Affinity Scores GTE 0.5 Affinity Score of 1.0

  24. Blog-based IAN – Filtered by Thresholds Affinity Thresholds Score GTE 0.5Count GTE 3 2/15 – 3/15

More Related