1 / 31

You Are What You Link

You Are What You Link. Lada Adamic Eytan Adar. WWW 10 – May, 2001. Outline. Graph structures of social networks How person to person links on the web create observable social networks Understanding and predicting links

sloan
Télécharger la présentation

You Are What You Link

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001

  2. Outline • Graph structures of social networks • How person to person links on the web create observable social networks • Understanding and predicting links • Additional online info (text, links, email subscriptions) gives context to social links • Predict social links even where there is no explicit hyperlink. • Understanding communities through links

  3. Julie Becky my roomie Becky Hey, I’m Becky. Hi, I’m Julie! I study... I live in ... My favorite books are... Here are some photos... I’m studying... my best friend Julie I like ... My friends are... My favorite links:

  4. Becky and Julie aren’t the only ones to link to each other

  5. Stanford Social Web

  6. Graph Structure of Social Networks

  7. Differences in cohesiveness of communities Stanford MIT

  8. Links among personal homepages at MIT and Stanford

  9. The number of links/person is uneven Interesting social networks analysis

  10. Largest connected component MIT: 86% Stanford: 58%

  11. MIT: 6.4 hops Stanford: 9.2 hops Shortest path from one person to another

  12. # of links among neighbors C = max # links among neighbors Clustering Coefficient 3 1 = C = 4*3/2 2 MIT: 0.22 Stanford: 0.21 70x that of a random graph!

  13. Understanding and Predicting Links

  14. email list outlink outlink inlink inlink Information available online common text common text

  15. How information was collected User’s web directories were crawled Outlinks were extracted Text was passed through ThingFinder to extract things like people, places, companies Mailing list subscriptions were obtained from the mailing list servers (95% public for Stanford, internal to MIT) Inlinks were obtained by querying search engines: Google for Stanford AltaVista for MIT (equivalent urls)

  16. Comparison with traditional means of gathering information on social networks Advantages Easily and automatically gathered (no phone, live, or mail surveys). Data sets are orders of magnitude larger. Information is already public. Disadvantages Data sets are incomplete i.e. you don’t get to ask the questions, just take down the answers

  17. Friends have more in common

  18. http://negotiation.parc.xerox.com/web10/

  19. So can we guess who’s friends with whom from the information gathered online? • Choose person A • Rank everybody else according to their likeness to that person • See how “friends” (people who are linked to A) were ranked. • Evaluate for text, outlinks, inlinks, mailing lists separately

  20. Example, top matches for a particular user annaken: Clifford Hsiang Chao

  21. Coverage in ability to predict user-user links i.e. friends had at least one item in common

  22. Performance of friend matching algorithm Stanford The most common ranking for a friend is #1 MIT

  23. Stanford we don’t have that much in common with our friend’s friend’s friends

  24. Understanding Communities Through Links

  25. What are good and bad link predictors? • What you would expect… • Very unique things are only relevant to individuals • Very general things (“MIT” “Stanford”) are relevant to everyone • Some top 10 lists…

  26. Text Based Predictors • Bad phrases: general organizations, cities (Oakland, Cambridge, etc), departments (CS)

  27. Out-link Based Predictors • Worst ranked sites are search engines and portals (Altavista, Lycos, Yahoo, etc.), and top level homepages such as www.mit.edu and www.stanford.edu.

  28. In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing to lists of friends • Poor predictors: Long lists (all homepages, department listings)

  29. Mailing List Based Predictors • Bad lists: General announcement lists at MIT, non-housing based activities (theater), job lists

  30. Future Work • Use other pieces of available information • demographic information (where people live, department, year, etc.) • combine information • Label structures (Flake, et. al. 2000) • Given structures determined by graph algorithms • Label them using extracted information

  31. Summary • Homepage graph structure varies depending on community • Possible to predict (to some degree) where links will exist • Good predictors seem unique to communities

More Related