You Are What You Link

You Are What You Link Lada Adamic Eytan Adar WWW 10 – May, 2001

Outline • Graph structures of social networks • How person to person links on the web create observable social networks • Understanding and predicting links • Additional online info (text, links, email subscriptions) gives context to social links • Predict social links even where there is no explicit hyperlink. • Understanding communities through links

Julie Becky my roomie Becky Hey, I’m Becky. Hi, I’m Julie! I study... I live in ... My favorite books are... Here are some photos... I’m studying... my best friend Julie I like ... My friends are... My favorite links:

Becky and Julie aren’t the only ones to link to each other

Stanford Social Web

Graph Structure of Social Networks

Differences in cohesiveness of communities Stanford MIT

Links among personal homepages at MIT and Stanford

The number of links/person is uneven Interesting social networks analysis

Largest connected component MIT: 86% Stanford: 58%

MIT: 6.4 hops Stanford: 9.2 hops Shortest path from one person to another

# of links among neighbors C = max # links among neighbors Clustering Coefficient 3 1 = C = 4*3/2 2 MIT: 0.22 Stanford: 0.21 70x that of a random graph!

Understanding and Predicting Links

email list outlink outlink inlink inlink Information available online common text common text

How information was collected User’s web directories were crawled Outlinks were extracted Text was passed through ThingFinder to extract things like people, places, companies Mailing list subscriptions were obtained from the mailing list servers (95% public for Stanford, internal to MIT) Inlinks were obtained by querying search engines: Google for Stanford AltaVista for MIT (equivalent urls)

Comparison with traditional means of gathering information on social networks Advantages Easily and automatically gathered (no phone, live, or mail surveys). Data sets are orders of magnitude larger. Information is already public. Disadvantages Data sets are incomplete i.e. you don’t get to ask the questions, just take down the answers

Friends have more in common

http://negotiation.parc.xerox.com/web10/

So can we guess who’s friends with whom from the information gathered online? • Choose person A • Rank everybody else according to their likeness to that person • See how “friends” (people who are linked to A) were ranked. • Evaluate for text, outlinks, inlinks, mailing lists separately

Example, top matches for a particular user annaken: Clifford Hsiang Chao

Coverage in ability to predict user-user links i.e. friends had at least one item in common

Performance of friend matching algorithm Stanford The most common ranking for a friend is #1 MIT

Stanford we don’t have that much in common with our friend’s friend’s friends

Understanding Communities Through Links

What are good and bad link predictors? • What you would expect… • Very unique things are only relevant to individuals • Very general things (“MIT” “Stanford”) are relevant to everyone • Some top 10 lists…

Text Based Predictors • Bad phrases: general organizations, cities (Oakland, Cambridge, etc), departments (CS)

Out-link Based Predictors • Worst ranked sites are search engines and portals (Altavista, Lycos, Yahoo, etc.), and top level homepages such as www.mit.edu and www.stanford.edu.

In-link Based Predictors • The top predictors are almost exclusively individual home pages pointing to lists of friends • Poor predictors: Long lists (all homepages, department listings)

Mailing List Based Predictors • Bad lists: General announcement lists at MIT, non-housing based activities (theater), job lists

Future Work • Use other pieces of available information • demographic information (where people live, department, year, etc.) • combine information • Label structures (Flake, et. al. 2000) • Given structures determined by graph algorithms • Label them using extracted information

Summary • Homepage graph structure varies depending on community • Possible to predict (to some degree) where links will exist • Good predictors seem unique to communities

You Are What You Link

You Are What You Link

Presentation Transcript

You Are What You Tag

You are What You Eat

You are King John – are you the weakest link?

You Are What You Eat

You Are What You Eat

WHAT ARE YOU?

What are You Saying? What are You Doing?

You Are What You Eat

YOU ARE WHAT YOU EAT?

YOU ARE WHAT YOU EAT:

You Are What You Eat!

You Are What You Eat

“You are what you eat!”

What Are You Doing? What Are You Doing? What Are You Doing?

Are you the Weakest Link?

You are what you eat!

You Are What You Tag

What Are You Doing? What Are You Doing? What Are You Doing?

You Are What You Eat

You are what you eat!

You are what you consume