Predicting Positive and Negative Links in Online Social Networks

Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell) Predicting Positive and Negative Links inOnline Social Networks

Social Interaction on the Web • Rich social structure in online computing applications • Such structures are modeled by networks • Most social network analyses view links as positive • Friends • Fans • Followers • But generally links can convey either friendship or antagonism

Networks with signed edges + • Our plan • Study social interactions on the Web that have positive and negative relationships • Questions • How do edge signs and network structure interact? • Approach • Edge sign prediction problem • Given a network and signs on all but one edge, predict the missing sign • Applications • Friend recommendation for social media • Easier to predict whether you know someone vs. to predict what you think of them – – ? + – + – – + – – + + +

Datasets for sign prediction + • Each link AB is explicitly tagged with a sign: • Epinions: Trust/Distrust • Does A trust B’s product reviews? (only positive links are visible) • Wikipedia: Support/Oppose • Does A support B to becomeWikipedia administrator? • Slashdot: Friend/Foe • Does A like B’s comments? • Other examples: • World of Warcraft[Szell et al. 2010] – – + + – + – – + – – + + +

Summary of our findings • Edge signs can be predicted with ~90% accuracy using only the local network structure • No need for global trust-propagation mechanisms • Data oriented justification of classical theories from social psychology • Our models align with theories of Balance and Status • Near perfect generalization: • Same underlying mechanism of signed edge creation • Can train on “how people vote” and predict trust as well as the model trained on trust itself

Predict edge sign: A ML formulation + – – Machine Learning formulation: • Predict sign of edge (u,v) • Class label: • +1: positive edge • -1: negative edge • Learning method: • Logistic regression • Evaluation: • Accuracy and ROC curves • Dataset: • Original: 80% +edges • Balanced: 50% +edges • Features for learning: • Next slide + – ? + – – + – – + + + v u

Features for learning For each edge (u,v) create features: • Triad counts (16): • Counts of signed triads edge uv takes part in • Node degree (7 features): • Signed degree: • d+out(u), d-out(u), d+in(v), d-in(v) • Total degree: • dout(u), din(v) • Embeddednessof edge (u,v) - + + + u v - - + -

Edge sign prediction Epin • Error rates: • Epinions: 6.5% • Slashdot: 6.6% • Wikipedia: 19% • Signs can be modeled from local network structure alone • Trust propagation model of [Guha et al. ‘04] has 14% error on Epinions • Triad features perform less well for less embedded edges • Wikipedia is harder to model: • Votes are publicly visible Slash Wiki

Connection to social theories • Our goal is not just to predict signs but also to derive insights into usage of signed edges • Logistic regression learns a weight bi for each feature xi: • Connection to theories from social psychology: • Structural balance • Theory of status which both give predictions onthe sign of the edge (u,v) basedon the triad it is embedded into - + u v

Theory of Structural Balance Consider edges as undirected • Start with intuition [Heider ’46]: • Friend of my friend is my friend • Enemy of enemy is my friend • Enemy of friend is my enemy • Look at connected triples of nodes that are consistent with this logic: + + + - - - - + +

Theory of Status • Status theory [Davis-Leinhardt ‘68, Guha et al. ’04, Leskovec et al. ‘10] • Link u v means: v has higher status than u • Link u v means: v has lower status than u • Based on signs/directions of links from/to nodex make a prediction • Status and balance can make differentpredictions: + – - x x x - + + + - u v u v u v Balance: + Status: – LogReg: – Balance: + Status: – LogReg: – Balance: – Status: – LogReg: –

Balance and Status: Complete model + - + - + - - + + + - + - + - + + - - + + - + - - + - + - + - -

Balance and Status: Observations • Both theories agree well with learned models • Further observations: • Backward-backward triads have smaller weights than forward and mixed direction triads • Balance is in better agreement with Epinions and Slashdot while Status is with Wikipedia • Balance consistently disagrees with “enemy of my enemy is my friend” x u v

Balance theory • Balance based and learned coefficients: - + + - - + - + - + - + Model if signs would be created purely based on Balance theory

Status theory • Status based and learned coefficients: + x + u v x + – u v x + – u v x – – u v Triads where u > x > v Model if signs would be created purely based on Status theory

Learned vs. Deterministic models Epin • Deterministic models compare well to Learned models • Epinions and Slashdot: • More embedded edges are easier to predict • Wikipedia: • Status outperforms balance • Learned balance performs nearly as well as the full model Slash Wiki

Generalization • Do people use these very different linking systems by obeying the same principles? • How generalizable are the results across the datasets? • Train on row “dataset”, predict on “column” • Almost perfect generalization of the models even though networks come from very different applications

Does negative information help? + + • Suppose we are only interested in predicting whether there is a trust edge or no edge • Does knowing negative edges help? – – ? ? + + – + + – – + + – – + + YES! + + + + Vs.

From Local to Global Structure • Both theories make predictions about the global structure of the network • Structural balance – Factions • Put nodes into groups such that the number of in group “+” and between group “-” edges is maximized • Status theory – Global Status • Flip direction and sign of negative edges • Assign each node a unique status value so that most edges point from low to high + + - 5 3 1

From Local to Global Structure • Fraction of edges of the network that satisfy Balance and Status? • Observations: • No evidence for global balance beyond the random baselines • Real data is 80% consistent vs. 80% consistency under random baseline • Evidence for global status beyond the random baselines • Real data is 80% consistent, but 50% consistency under random baseline

Conclusion • Signed networks provide insight into how social computing systems are used: • Status vs. Balance • Sign of relationship can be reliably predicted from the local network context • ~90% accuracy sign of the edge • More evidence that networks are globally organized based on status • People use signed edges consistently regardless of particular application • Near perfect generalization of models across datasets • Negative information helps in predicting positive edges

THANKS! Data + Code: http://snap.stanford.edu Jure Leskovec

Heuristic predictors • Heuristic predictors for (u,v): • Balance: Chose sign that makes majority of the triads balanced • Status: Predict in based on status(v)=d+in(u)+d-out(u)-d+out(v)-d-in(v) • Out-sign of v: majority sign • In-sign of u: majority sign • Observations: • Triadic models do better with increasing embeddedness

Predicting Positive and Negative Links in Online Social Networks

Predicting Positive and Negative Links in Online Social Networks

Presentation Transcript

Positive and negative politeness

Positive and Negative Feedback

Networks with Positive and Negative Edges

Positive and Negative Numbers

Predicting Social Security Numbers from Online Social Networks

Positive and Negative Exponents

Positive and negative interactions

Negative and Positive Words

Negative and Positive Rights

Positive and Negative Influences

Predicting Emerging Social Conventions in Online Social Networks

Positive and Negative Rules

Positive and Negative Expression

Positive and Negative

Predicting Positive and Negative Links in Online Social Networks

Positive and Negative Numbers

Positive and Negative Integers

Supervised Random Walks: Predicting and Recommending Links in Social Networks

Positive and Negative Numbers.

Positive and Negative Space

Positive and Negative Numbers

Positive and Negative Numbers