Phenomenology of Social Media

Phenomenology of Social Media Kristina Lerman University of Southern California CS 599: Social Media Analysis University of Southern California

Phenomenology and phenomenological models • What phenomena can be observed in social media? • Look for patterns and regularities in aggregate behavior of a large population of users • Average behavior? Distribution of behavior • What mechanisms explain the observed phenomena? • Simple rules • Express rules through mathematical models to reproduce observed regularities • Link simple rules to psychological or sociological theories

Characteristic response to news events % of all tweets % of all tweets % of all tweets Twitter data from trendistic

Response to news on Digg Votes per hour received by story Popularity (total votes) over time

Dynamics of response on Digg and Twitter Digg Twitter 1: U.S. Government Asks Twitter to Stay Up for #IranElection 2: Western Corporations Helped Censor Iranian Internet 3: Iranian clerics defy ayatollah, join protests 1: US gov asks twitter to stay up 2: Iran Has Built a Censorship Monster with help of west tech 3: Clerics join Iran’s anti-government protests - CNN.com

Topics covered • Social information processing in social news, by Lerman • Can simple models explain dynamics of popularity of news stories? • Strong regularities in online peer production, by Wilkinson • Common properties of distribution of user and topic activity across different systems • Influence and Correlation in Social Networks, by Anagnostopoulos, Kumar and Mahdian, KDD 2008 • Do people influence the behavior of others? How can we tell?

The Wizards of Buzz A new kind of Web site is turning ordinary people into hidden influencers, shaping what we read, watch and buy. By JAMIN WARREN and JOHN JURGENSENFebruary 10, 2007; Page P1 … A new generation of hidden influencers is taking root online, fueled by a growing love affair among Web sites with letting users vote on their favorite submissions. These sites are the next wave in the social-networking craze -- popularized by MySpace and Facebook. Digg is one of the most prominent of these sites, which are variously labeled social bookmarking or social news. Others include Reddit.com3 (recently purchased by Condé Nast), Del.icio.us4 (bought by Yahoo), Newsvine.com5 and StumbleUpon.com6. Netscape7 relaunched last June with a similar format. The opinions of these key users have implications for advertisers shelling out money for Internet ads, trend watchers trying to understand what's cool among young people, and companies whose products or services get plucked for notice. It's even sparking a new form of payola, as marketers try to buy votes.

Social news on Digg Front page: 100 stories promoted daily promoted Upcoming stories: 25,000+ submitted daily (2009)

Social networks: follow friends to get relevant news Stories friends voted on Stories friends submitted

Top users • Digg ranked users by the number of submitted stories that were promoted to the front page • Displayed Top Users List to motivate users to contribute

Troubles in Diggville Michael Arrington. 09/06/2006 The incredibly successful news site Digg has hit a few speed bumps recently… A number of people have recently complained about the ability for groups of users to get a story to the home page by acting as a group. [One] blogger analyzed Digg and concluded that a small group of powerful Digg users, acting together, control a large percentage of total home page stories. To some this is troubling because… unlike newspapers like the New York Times, where a small group of editors decide what is “news,” Digg is a more democratic process where the readers actually decide what is newsworthy. …Others respond that these groups are just hard core Digg users that spend much of their day scouring the web for good stories to promote on Digg. Digg ranks users based on how successful their submitted stories become, and a handful of users are hyper-competitive about their Digg ranking. The argument is that these users are simply more proficient at finding stories. Today Digg responded to these complaints. …it will soon be implementing a new algorithm that weighs a diversified group of Diggers more heavily than groups acting together.

User success correlated with social network size • Observation • Users with more friends and followers have more stories promoted to the front page • Conspiracy? Or natural outcome of social voting? • Conspiracy • Users conspire to promote each others’ stories • Social voting • Users look at friends’ posts to discover interesting stories Success (fraction of user’s stories promoted to front page) vs social network size followers followers

Social voting • Claim: Users tend to digg (vote for) stories friends submit • We will prove it by showing it is highly unlikely to observe as many followers votes purely by chance Average number of followers who vote for stories user submits vs the number of followers user has Could this happen purely by chance? ave. # follower votes, <k> # followers, K

Urn model: voting as a stochastic process • Assume there are N balls in an urn, K of which are white. Suppose n balls are picked at random from the urn. What is the probability that k are white? K white balls in urn Pick n balls from urn at random Probability that k balls are white

Urn model: voting as a stochastic process • Assume there are N users, K of whom follow the story submitter. Suppose n users vote for the story. What is the probability that k of them happen to be submitter’s followers? Average number of followers who vote for stories user submits vs the number of followers user has Probability k of the first n votes are from submitter’s followers ave. # follower votes, <k> # followers, K  For submitters with K>100 followers, it is highly unlikely to observe that many votes from followers by chance. Therefore, users vote for stories friends submit.

Dynamics of social voting Story popularity User interface Despite differences, each story (colored line) has similar dynamics of popularity

Mathematical model of social news browse front page navigate view story interesting? 1 2 … 1 2 … friends friends browse friends 1 2 … view story interesting? 1 2 … 1 2 … upcoming upcoming browse upcoming view story interesting? 1 2 … 1 2 …

Mathematical model of social news browse front page navigate view story interesting? 1 2 … 1 2 … friends friends browse friends 1 2 … view story interesting? 1 2 … 1 2 … upcoming upcoming browse upcoming view story interesting? 1 2 … 1 2 … probability to view the story on the front page r probability to view it in the social stream r probability to view it on upcoming pages r

Mathematical model of social news browse front page navigate view story interesting? 1 2 … 1 2 … friends friends browse friends 1 2 … view story interesting? 1 2 … 1 2 … upcoming upcoming browse upcoming view story interesting? 1 2 … 1 2 … probability to view the story on the front page r probability to view it in the social stream N=number of Digg users vf=visibility on front page r=story interestingness Model has only one adjustable parameter (r). Other parameters are measured from data. r probability to view it on upcoming pages r

Probability to view the story on the front page Newer stories push a given story down the page, and on to page 2, 3, …, upcoming Promoted story front page A given story is less likely to be seen over time [phenome-nological]

Dynamics of social voting: model prediction Evolution of popularity of six real Digg stories. S is number of submitter’s followers Model predictions. Values of story interestingness (r) are estimated from data

Popular submitter advantage promoted story not promoted promotion threshold [2006 data]  Less interesting (lower r) stories submitted by popular users (many followers) will be promoted to the front pages (no need for conspiracy theories)

Predict popularity votes time (hours) prediction time t • Estimate how interesting story is based on early votes • Solve model for later times to predict future votes [Hogg & Lerman, “Social Dynamics of Digg” in EPJ Data Science, 2012]

Summary • People use their social networks to find interesting content • E.g., see stories friends post • This affects how popular stories become and how successful users are in having their stories promoted to the front page • Popular submitter advantage • Simple phenomenological model explains dynamics of social voting • Story visibility (on front page, upcoming stories page, social stream): all parameters measured from data • Story interestingness: only adjustable parameter Model explains and predicts story popularity

Strong regularities in social media (Wilkinson, 2008) • Questions • Are there regular patterns in the collective behavior of social media users? • Are there simple explanations of these regularities? • Findings • Heterogeneous distribution of user activity • Small number of active users make most of the contributions • Activity depends on level of effort • Regularities can arise from simple dynamical rules

Social systems are complex but predictable • Social systems are complex • Many users • High degree of variability in people’s decisions to participate • Many possible interactions • High degree of variability in people’s reactions to others • Low barriers to interaction • Social systems are predictable • Macroscopic (large-scale) regularities in collective behavior of large population • Simple dynamical rules explain regularities • Not psychological or sociological principles • Distinguish between general and system-level trends • Lots of data for empirical analysis!

Systems and data • Wikipedia: online encyclopedia • Articles (topics), non-robot edits (contributions) • Bugzilla: open source software development service • Reported bugs (topics), discussion comments (contributions) • Digg: social news aggregator • New articles (topics), votes (contributions) • Essembly: online political forum • Political resolves (topics), votes (contributions)

User participation: distribution of the number of contributions Buzilla comments & Essembly resolve submissions Digg & Essembly votes Wikipedia edits & Digg story submissions

Power law distribution of contribution Power law behavior • Number of users who made k contributions N(k) = Ck-a • Participation “momentum”: Probability user quits after kth contribution • The more contributions made, the harder to quit • Exponent a represents barrier to participation

Contribution effort and power law exponent The larger the value of a, the greater the effort required to contribute • Easy • Digg and Essembly voting requires little time or personal investment • Moderately difficult • Making a Bugzilla comment or submitting a new resolve on Essembly • Difficult • Submitting a new Digg story, or editing Wikipedia page

Contribution effort and power law exponent 2 difficult easy

Topic activity • How much activity does a single topic generate?  Distribution is log-normal (normal distribution of log(x)) Number of votes for an Essembly resolve Number of edits of a Wikipedia article

Where does log-normal come from? • Multiplicative reinforcement as a model for log-normal distribution • Amount of new activity proportional to amount of existing activity • E.g., popularity (amount of activity) raises visibility, creating new activity • Phenomenological mathematical model dNt=(m+sdBt)Nt dt • Nt: number of contributions on a topic until time t • m: average rate of contribution (independent of topic, time) • sdBt: stochastic noise accounting for fluctuations in human behavior, with variance s

Summary Macroscopic properties of diverse social media systems where people create, rate and share content are very similar and can be explained in terms of simple dynamical rules • User participation described by a power law • Explained by “momentum” associated with participation, where probability of quitting is inversely proportional to the number of previous contributions • Power law exponent related to effort required to contribute • Topic activity described by a log normal • Explained by a multiplicative reinforcement mechanism in which contributions increase popularity • Systems depend on heavy contributors and popular topics

Influence and correlation in social networks (Anagnostopoulos et al.) • Questions • Do social networks shape user behavior? • How can we identify social influence and distinguish it from other factors, such as homophily or other confounding variables? • Contributions • Statistical test to identify influence as source of social correlation • Findings • Correlation in tagging behavior on Flickr cannot be attributed to social influence

Correlation in social networks • Online social networks are important in shaping user behavior. As a result, social behavior is often correlated. • What is the source of correlation? • Influence? Homophily? Confounding? A B tag=donut donut t1 t2

Sources of social correlation Homophily Influence Confounding A A A B B B Correlation through external (environmental) factors, e.g., users posted pictures of the same place since they live in the same city A and B became friends because they are similar to each other; therefore, they perform similar actions B’s action is caused by A’s action. If correlation is caused by influence, we can leverage it to amplify diffusion X X

Correlation models Confounding & homophily Network G Set of active users W Select (G,W) according to joint probability distribution Time of activation of users in W is picked from distribution T Influence At each time step, a non-active user becomes active with probability p(a), where a is number of her active friends T p(a) a=# active friends T 0

Measuring social correlation • What is the form of p(a)? Empirically, from Flickr tags data • Parameter a measures the amount of social correlation • Estimate a, b using maximum likelihood logistic regression • LetYa,t be the number of users with a active friends who performed the action at time t; and Ya=StYa,t • and Na,t users who did not perform the action; Na=StNa,t • Choose values of a, b that maximize a, Ya, Na are observed

The shuffle test • Does influence give rise to the observed series of user actions • Estimate social correlation a using maximum likelihood • Shuffle actions in time • Estimate social correlation a’ using maximum likelihood A B C D E F G H I J t3 t5 t1 t2 t4 t6 t8 t9 t10 t7 G E I A J F D C B H t9 t10 t7 t5 t1 t6 t4 t2 t8 t4  there is no social influence if a ~ a’

Edge reversal test • Alternate statistical test • Estimate parameter a • Reverse direction of edges in a (directed friendship) graph • Estimate a’ A B C D E F G H I J A B C D E F G H I J t3 t5 t1 t2 t4 t6 t8 t9 t10 t7 t3 t5 t1 t2 t4 t6 t8 t9 t10 t7

Validation on synthetic data Generate activations (users adopting a new tag) according to specific rules • No-correlation model • At each time step, pick new users of a tag uniformly at random • Influence model • At each time step, an inactive user becomes active with probability p(a),where a number of active friends • Probability parameterized by a • Correlation model (no-influence) • Select S users, and add their neighbors and neighbors of neighbors to S • Select active users randomly, as in model 1.

Measuring correlation strength in synthetic data • Frequency distribution (histogram) of a measured from data Correlation model Influence model No correlation

Distinguishing influence: shuffle test • Measured a of original and shuffled tagging time steps Correlation model Influence model  Value of a are close: no influence

Experiments on Flickr data • Tagging behavior of Flickr users over a period of 16 months • 340K users tagged a photo at least once • 160K of these were connected • 2.8M edges • Rest are isolated • Selected 1.7K of 10K tags these users used • Most were used by more than 1K users • “halloween”, “katrina”, “photos”, “moon”, etc.

Correlation and influence on Flickr Measuring correlation Distinguishing influence: a of original vs shuffled time step for each tag Correlation exists: a > 0 Correlation cannot be attributed to influence

Summary • Proposed statistical analysis to identify and measure social influence as a source of correlation between the actions of individuals with social ties. • Distinguishing correlation from causation • Availability of time-resolved data about human behavior enables us to tackle this difficult problem • Applied to data from a large social system • There is correlation, but it cannot be explained by influence

Phenomenology of Social Media

Phenomenology of Social Media

Presentation Transcript

Elements of Social Media

Impact of Social Media

Phenomenology of Supersolids

Foundations of Social Media

Uses of Social Media

Genderization of Social Media

Phenomenology

Phenomenology

State of Social Media

Impact of Social Media

Dangers of Social Media

History of social media

Analysis of Social Media

Phenomenology of social dynamics

“Dangers of Social Media”

Power of Social Media

Power Of Social Media

Role of social media

Benefits of Social Media

Impact of Social Media

Phenomenology of Supersolids

what is social media? types of social media?