1 / 45

Information Diffusion in Social Media

Information Diffusion in Social Media. Kristina Lerman University of Southern California. Information diffusion on Twitter follower graph. Diffusion on networks.

brettl
Télécharger la présentation

Information Diffusion in Social Media

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Diffusion in Social Media Kristina Lerman University of Southern California CS 599: Social Media Analysis University of Southern California

  2. Information diffusion on Twitter follower graph

  3. Diffusion on networks • The spread of disease, ideas, behaviors, … on a network can be described as a contagion process where an active node (infected/informed/adopted) activates its non-activeneighbors with some probability • … creates a cascade on a network • How large do cascades become? • What determines their growth?

  4. Diffusion models • Complex response: infection requires multiple exposures. • Non-monotonic exposure response Exposure response function Complex contagion Threshold model 1 1 infection prob. infection prob. fiki number infected neighbors number infected neighbors

  5. Epidemic diffusion model • Infected nodes propagate contagion to susceptible neighbors with probability m (transmissibility or virality of contagion) Exposure response function 1 infected infection prob. exposed number infected neighbors

  6. Epidemic threshold • Epidemic threshold t: • For m < t, localized cascades (epidemic dies out) • For m> t, global cascades • Epidemic threshold depends on topology only: largest eigenvalue of adjacency matrix of the network • True for any network N Cascade size Epidemic threshold 0 Transmissibility, m

  7. Differences in the Mechanics of Information Diffusion across Topics: Idioms, Political Hashtags and Complex Contagion on Twitter Daniel M Romero, Brendan Meeder and Jon Kleinberg Presentation by Aswin Rajkumar

  8. Motivation and Contribution • Information Diffusion and Topics- Eg: Controversial political topics have high information diffusion.- Scientific study of the variation in diffusion mechanics across topics. • Contribution of the paper- Empirical analysis of real world data- Observation that the mechanics of spread can be defined using two variables, stickiness and persistence.- Confirmation of sociological theories found in the offline world – diffusion of innovations

  9. The Study – How? • Twitter – Dataset, a snapshot covering a large number of tweets over a period of several months (Aug 09 to Jan 10) • 3 billion messages from over 60 million users • #Hashtag – Tokens, Top 500 Hashtags • @Mention – Network, Neighbor Sett mentions from X to Y, t = 3Why? Shows X’s attention to Y.

  10. The Study – What? • Adoption and Spread of Hashtags - Diffusion • Topics – Politics, Celebrity, Music, Movies, Games, Idioms, Sports and Technology • Stickiness - the probability that a piece of information will pass from a person who knows or mentions it to another person who is exposed to it. • Persistence and “Complex Contagion”, a principle from sociology. Persistence - the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects on adoption.Rate of decay.

  11. Complex Contagion Complex contagion refers to the phenomenon in social networks in which multiple sources of exposure to an innovation are required before an individual adopts the change of behavior. - Wikipedia

  12. P(K)StickinessPersistence

  13. Analysis – Stickiness and Persistence • Take the top 500 hashtags • Classify them into 8 topics or categories • Construct p(k) curves for each hashtag and average them separately within each category • Compare the shapesPolitical Hashtags – High Stickiness and PersistenceTwitter Idioms – High Stickiness, Low Persistence #mw2, #mafiawars #lost, #newmoon #mj, #brazilwantsjb #pandora, #thisiswar #obama, #hcr #cricket, #nhl #photoshop, #digg

  14. Twitter Idioms #cantlivewithout #musicmonday #iloveitwhen #followfriday

  15. Analysis – Subgraph Structure • Interconnections among early adopters • Subgraphs for political hashtags - High in-degree, large number of triangles. • Tie Strength – Strong, Weak. Credit : Bridge-talent.com

  16. Exposure Curve - Definitions • K-exposed – A user is k-exposed to a tag h if he has not used h, but is connected to k other users who have used h in the past. • What’s the probability that a k-exposed user u will use hashtag h in the future?1) Ordinal Time EstimateProbability of a k-exposed user u using hashtag h before becoming k+1 exposed.P(k) = I(k) / E(k) E(k) – number of k-exposed users I(k) – number of k-exposed users who used h before becoming k+1 exposed.2) Snapshot EstimateSimilar, but based on time. E(k) – numer of users k-exposed at t1. I(k) – number of users k-exposed at t1 and used h before t2P(k) = I(k) / E(k) -> Exposure Curve

  17. Comparison Parameters • Persistence ParameterF(P) = A(P) / R(P)A(P) – Area under P curve.R(P) – Area under the rectangle of length K and height max(P(k))Curve comparisonsIncreases rapidly and falls vs Increases slowly and saturatesIncreases slowly and saturates vs Rapid Increase • Stickiness ParameterM(P) = Max(P(K))

  18. Plots F(P) = A(P) / R(P) -> Persistence Parameter M(P) = Max(P(K)) -> Stickiness Parameter

  19. Improvements and Related Work • @Mention network is not very representative. Also, attention should be from Y to X. • Considers only average persistence. Median and variance should be analyzed too. • Other types of networks. Eg: Blogs. [Gruhl, Guha, Nowell, Tomkins - Information Diffusion through Blogspace]. • Influence on Online Behavior. Eg: Games. [Woo, Kang, Kim – The Contagion of Malicious Behaviors in Online Games] • Network structure is dynamic in real life. [Bano, Holthoefer, Wang, Moreno, Bailon – Diffusion Dynamics with Changing Network Composition]

  20. Conclusion • Hashtags of different topics exhibit different mechanics of spread. Politically controversial hashtags have the highest diffusion. • Information diffusion depends on the probability of users adopting a hashtag after repeated exposure to it. Depends on the magnitude of the probabilities as well as the rate of decay • Confirms the sociological theory of complex contagion • Higher in-degree and stronger ties results in better spread.

  21. Questions?

  22. What Stops Social Epidemics? (Ver Steeg et al.) • Why do information cascades in social media • Grow quickly initially • But remain much smaller than predicted by epidemic models? • Information cascades differ from viral contagion: • Response to repeated exposure is important on Digg (and Twitter) • Drastically alters predictions about size of epidemics

  23. Users submit or vote for (infected by) news stories Social network Users follow ‘friends’ to see Stories friends submit Stories friends vote for Trending stories Digg promotes most popular stories to its Top News page Social news:

  24. How large are cascades in social media? • Number of people who share a message (with a URL) Twitter Digg 70K URLs 700K users 36M edges 3.5K URLs 258K users 1.7M edges Most cascades less than 1% of total network size! [Lerman et al. “Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs” arXiv:1202.3162]

  25. Why are these cascades so small? Standard model of epidemic growth (Heterogenous mean field theory, SIR model, same degree distribution as Digg) Most cascades fall in this range Transmissibility, m Transmissibility of almost all Digg stories fall within width of this line?!

  26. Maybe graph structure is responsible? ← Mean field prediction (same degree dist.) ← Simulated cascades on a random graph with same degree dist. Simulated cascades on the observed Digg graph epidemic threshold Transmissibility m • clustering reduces epidemic threshold and cascade size, but not enough!

  27. What about the spreading mechanism? Infected Not Infected ?

  28. Are repeat exposures a big effect? Yes, more than half of the users are exposed to the same information more than once!

  29. How do people respond to repeated exposure? Exposure response Not much. We have similar results for Twitter ------- Also noted by Romero, et al, WWW 2011

  30. Big consequences for cascade growth • Most people are exposed to a story more than once • Repeated exposures have little effect • Growth of epidemics is severely curtailed (especially compared to Ind. Cascade Model)

  31. Weak response to repeated exposures suppresses outbreaks Take effect of repeat exposure into account: Actual Digg cascades Result of simulations Epidemic threshold unchanged λ* m*, Transmissibility

  32. How Limited Visibility and Divided Attention Constrain Social Contagion (Hodas & Lerman, 2012) • Questions • How do people respond to exposures to information by friends on social media? • What role does content play in information diffusion? • Findings • Users have finite ability to process information • Most recently received messages are retweeted, the rest are overlooked • Highly connected users (hubs) are far less likely to retweet any message they receive than poorly connected people • Reduced susceptibility of hubs to “infections” explains why cascades are small

  33. Mechanics of information diffusion User must see an item and find it interesting before he/she can spread it (e.g., by retweeting it, voting for or liking it, …) Cognitive Tastes Retweet Content Interface

  34. Cognitive factors: Position bias • People pay more attention to items at the top of the screen or a list of items [Payne, The Art of Asking Questions (1951) ] [Counts & Fisher ICWSM’11] [Buscher et al, CHI’09] … limits how far down the list/page the user navigates

  35. Measuring position bias • Amazon Mechanical Turk experiments • Users were asked to recommend science stories • We controlled the order stories were presented to users • Position bias: stories at top list positions received more recommendations [Lerman & Hogg (2014) “Leveraging position bias to improve peer recommendation” in Plos One.

  36. Position bias creates a “limited attention” prob. to view post post visibility new post at top of user’s screen position post near the top is most likely to be seen

  37. Position bias creates a “limited attention” … some time later: newer posts appear at the top prob. to view post post is less likely to be seen position

  38. Position bias and number of friends few friends many friends … some time later: newer posts appear at the top post is less likely to be seen same age post is even less visible to a highly connected user

  39. Friends are a source of distraction users with more friends are more active users with more friends are distracted by more content nf • Limited attention makes hubs less susceptible to ‘infection’

  40. Users retweet most recent messages high connectivity users “Time Response Function” low connectivity users • Users retweet newest messages (at the top of their screen) • Hubs are much less likely to retweet an older message

  41. Does content matter? visibility probability to tweet a message “virality” Estimated virality

  42. Do “viral” messages spread farther? ln(“virality”) … “viral” messages can reach many or few people

  43. How do people respond to multiple exposures? Exposure response Number of tweeting friends • Is this evidence for complex contagion?

  44. “Complex contagion”- artifact of heterogeneity low connectivity users high connectivity users • Breaking down exposure response by different sub-populations, separated according to number of friends they follow, reveals simple, monotonic response

  45. Summary • “A meme is not a virus” • Information spread ≠ Disease spread • Big consequences for modeling information spread in social media • Highly connected people (hubs) act as fire walls to information spread • They have a hard time finding messages in their stream  People have a finite capacity to process information; the more messages they receive, the less likely they are to respond to any given one • Information overload actually reduces the size of information cascades

More Related