P roduction and Evaluation in social media

Production and Evaluation in social media

Social media: Knowledge Sharing • Online encyclopedia • Online question-answer forum • CrowdSourcing

Questions • User contributed content • E.g. How does the coverage and content of wikipedia grow? • What prompts the choice of the next articles to be written? • What strategies do the users use in choosing which crowdsourced tasks to contribute to • User evaluations • What are the mechanisms behind user evaluations? • Are there possibly simpler explanations of user-user or user-content evaluations? • Design questions • What are some principles on which such sites could be better designed? • Ex: How to reward user-user evaluation or user-content evaluation?

Intro - Wikipedia • Free multilingual encyclopedia launched in 2001 • Operated by the non-profit Wikimedia Foundation • Contains 2,610,291 articles in English and 10 million in total • 236 active language editions • Content written by volunteers • No formal peer-review and changes take effect immediately • New articles are created by registered users but can be edited by anyone

Intro - Wikipedia Source: Wikipedia Article Count from Jan, 2001 to Sep 2007

The Collaborative Organization of Knowledge • Studies a dump of revisions over 6 years • Inflationary/Deflationary hypothesis • Does number of links to nonexistent articles increase at a higher rate than that of the new article creation? • Inflationary  maybe unusable at some point • Deflationary  growth stops at some point • How existing entries foster development of new entries? • Growth models: • http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia's_growth

Wikipedia growth *Incomplete include nonexistent articles and stubs Wikipedia is located in a midpoint between the two scenarios (thin coverage vs. decline in growth rate)

References lead to Definitions Most articles written in the first month of reference Mean number of references to a nonexistent article rose exponentially until the article was created. Once article is created, references rise linearly or levels.

Other findings • Growth of Wikipedia partly attributed to splitting of articles (depth in articles translate into breadth) • Deeply collaborative • Only in 3% of the cases is the reference creator the same as article writer • growth is limited by number of contributors not individual contributors! • Hypothesis: • Articles are more likely to be written because they are popular (have many references leading to them) than because contributor is interested • What kind of coverage does this growth pattern lead to? • Vandalism and reverts • 4% of article revisions were reverts • Average time to revert a vandalized page is 13 hours • 11% of pages that were reverted at least once had been vandalized at least once

Strategic users

Taskcn.com • 1.7 million registered users • In less than 2 years, requested solutions for nearly 3100 tasks and 543,000 solutions proposed • A user offers a monetary award for a question or task and other users provide solutions to compete for the award. • The website plays the role of the third party by collecting the money from the requester and distributing the award to the winner(s) who is (are) decided by the requester. • Website takes a small portion of the award as a service fee. • Socially stable: a core group of users who repeatedly propose and win

User strategies develop over time • Some strategies used by winners • Submitting later ; although cannot see other submissions • Choosing less popular tasks • Choosing tasks with higher winning odds • Participate in tasks of increasing expected reward

Selecting less popular tasks

Summary • Users (“winners”) seem to be learning strategic behavior • No significant change winning rate happens for the “average” user • However, this is a core of winners who improve their chance of winning with more participation i.e. leading to quickening in the succession of wins

User evaluations

Evaluations • Evaluating items • Movie & product reviews • Other users • epinions, wikipedia voting • Items created by other users • Q&A websites, helpfulness votes on reviews

Wikipedia voting • Admission to admin in WP is through voting • What drive user-user evaluation? • Status: whether V is higher/lower status than T • Level of recognition, merit, reputation in the community • Wikipedia: #edits, #barnstars • Similarity: whether users are similar • Overlapping topical interests of V and T • Wikipedia: similarity of articles edited

Difference in status is more important that target status • When target status is high, we get a flat line • Different lines corresponding to different Δ Δ = difference in status

Effect of similarity • Prior interaction/similarity boosts positive voting

Status vs. similarity • Status is used as a “proxy” when users do not know about each other

Summary • Evaluations are important part of social media • Important aspects • Status & similarity • Similarity breeds positive feedback • Status controls who shows up for feedback + the feedback itself • So much so that result of WP election can be predicted simply by who shows up!! • How can we make user evaluations truthful?

Eliciting feedback: Challenges • No reporting • “inconvenience” cost of contributing • Dishonesty • niceness, fear of retaliation • conflicting motivations

Eliciting feedback: Challenges • No reporting • “inconvenience” cost of contributing • Dishonesty • niceness, fear of retaliation • conflicting motivations • Reward systems • motivate participation, honest feedback • monetary (prestige, privilege, pure competition)

Overcoming Dishonesty • Need to distinguish “good” from “bad” reports • explicit reward systems require objective outcome, public knowledge • stock, weather forecasting • But what if … • subjective? (product quality/taste) • private? (breakdown frequency, seller reputability)

Overcoming Dishonesty • Need to distinguish “good” from “bad” reports • explicit reward systems require objective outcome, public knowledge • stock, weather forecasting • But what if … • subjective? (product quality/taste) • private? (breakdown frequency, seller reputability) • Naive solution: reward peer agreement • Information cascade, herding

Peer Prediction: basic idea • reports determine probability distribution on other reports • reward based on “predictive power” of user’s report for a reference rater’s report • taking advantage of proper scoring rules, honest reporting is a Nash equilibrium

Information Flow - Model announcement a PRODUCT type t CENTER (a) signal S transfer 

Information Flow - Model announcement a PRODUCT type t CENTER (a) signal S transfer  PRODUCT type t CENTER (a) signal S announcement a transfer 

Information Flow - Example h h l PLUMBER type ={H, L} signal = {h (high), l (low)} h h l $1 $1 $0 h h l (a) “agreement”

Assumptions - Model PRODUCT type t {1, …, T} f (s | t) • common prior: distribution p(t) • common knowledge: distribution f(s|t) • linear utility stochastic relevance - fixed type - finite T

Stochastic Relevance • Informally • same product, so signals dependent • certain observation (realization) should change posterior on type p(t), and thus on signal distribution f(s | t) • Rolex v. Faux-lex • generically satisfied if different types yield different signal distributions

Stochastic Relevance • Informally • same product, so signals dependent • certain observation (realization) should change posterior on type p(t), and thus on signal distribution f(s) • Rolex v. Faux-lex • generically satisfied if different types yield different signal distributions • Formally • Sistochastically relevant for Sjiff: • distribution (Si | Sj) different for different realizations of Sj • there is sj such that

Assumptions - Example • finite T: plumber is either H or L • fixed type: plumber quality does not change • common prior: p(t) • p(H) = p(L) = .5 • stochastic relevance: need good plumber’s signal distribution to be different than bad’s • common knowledge: f(s|t) • p(h | H) = .85, p(h | L) = .45 • note this gives p(h), p(l)

Definitions - Example h h l h h l T(a) $1 $1 $0 PLUMBER type ={H, L} signal = {h, l} • 2 types, 2 signals, 3 raters • signals: S = (h, h, l) • announcements: a = (h, h, l) • transfers: (a) = (1 (a), 2(a), 3(a)) • announcement strategy for player 2: a2 = (a2h, a2l) • total set of strategies: (h, h), (h, l), (l, h), (l, l)

Best Responses - Example • Player 1 receives signal h • Player 2’s strategy is to report a2 • Player 1 reporting signal h is a best-responseif t1(a1, a2) t2 (a1, a2) PLUMBER h or l a2 T(a) S1 = h S2 = ? Nash equilibrium if it holds for all users

Peer Prediction • Find reward mechanism that induces honest reporting • where ai = Si for all i is a Nash equilibrium • Will need Proper Scoring Rules

Proper Scoring Rules • Definition: • a scoring rule assigns to prob. vector S a score for each realization a • R ( S | a ) • Expected payoff maximized if S = true probs of {a} • Ensures truthtelling • What if there’s no public signal?

Applying Scoring Rules • Definition: • a scoring rule assigns to prob. vector S a score for each realization a • R ( S | a ) • Expected payoff maximized if S = true probs of {a} • Ensures truthtelling • What if there’s no public signal? –use other peers • Now: predictive peers • Si = my signal, Sj = your signal, ai = my report • R (your report | my report )

How it Works • For each rater i, we choose a different reference rater r(i) • Rater i is rewarded for predicting rater r(i)’s announcement • *i (ai, ar(i) ) = R( ar(i), ai) • based on updated beliefs about r(i)’s announcement given i’s announcement • Claim: for any strictlyproper scoring rule R, a reward system with payments *i makes truthful reporting a strict Nash equilibrium

Peer Prediction Example • Player 1 observes low and must decide a1 = {h, l} • Using logarithmic scoring • t1(a1, a2) = R(a2 | a1) = log[ p(a2 | a1 )] • What signal maximizes expected payoff? • Note that peer agreement would incentivize dishonesty (h) PLUMBER p(H) = p(L) = .5 a1 = {h, l} a2 = s2 t1(a1, a2) T(a) S1 = l S2 = ? p(h | H) = .85 p(h | L) = .45 Pr[ l | l ] = 0.46 Pr[ h | l ] = 0.54

Peer Prediction Example • Player 1 observes low and must decide a1 = {h, l} • Assume logarithmic scoring • t1(a1, a2) = R(a2 | a1) = log[ p(a2 | a1 )] PLUMBER p(H) = p(L) = .5 a1 = {h, l} a2 = s2 t1(a1, a2) T(a) S1 = l S2 = ? p(h | H) = .85 p(h | L) = .45 • a1 = l (honest) yields expected transfer: -.69 • a1 = h (false) yields expected transfer: -.75

Things to Note • Players don’t have to perform complicated Bayesian reasoning if they: • trust the center to accurately compute posteriors • believe other players will report honestly • Not unique equilibrium • Could be other mixed strategies • collusion

Primary Practical Concerns • Examples • inducing effort: fixed cost c > 0 of reporting • better information: users seek multiple samples • budget balancing • Basic idea: • affine rescaling (a*x + b) to overcome obstacle • preserves honesty incentive • Followup on trying to balance budget as much as possible

Limitations • Collusion • could a subset of raters gain higher transfers? higher balanced transfers? • can such strategies: • overcome random pairings • avoid suspicious patterns • Understanding/trust in the system • complicated Bayesian reasoning, payoff rules • rely on experts to ensure public confidence

Discussion • Is the common priors assumption reasonable? • How might we relax it and keep some positive results? • What are the most serious challenges to implementation? • Can you envision a(n online) system that rewards feedback? • How would the dynamics differ from a reward-less system? • How would we extend such formalization to more general feedback systems? • creating + feedback • Can we incorporate observed user behaviors?

Thanks

Common Prior Assumption • Practical concern - how do we know p(t)? • Theoretical concern - are p(t), f(s|t) public? • raters trust center to compute appropriate posterior distributions for reference rater’s signal • rater with private information has no guarantee • center will not report true posterior beliefs • rater might skew report to reflect appropriate posteriors • report both private information and announcement • two scoring mechanisms, one for distribution implied by private priors, another for distribution implied by announcement

Best Responses - Model • Each player decides announcement strategy ai • ai is a best-response to other strategies a-i if: • Best-response strategy maximizes rater’s expected transfer with respect to other raters’ signals … conditional on Si = sm • Nash equilibrium if equation holds for all i T(a)

Definitions - Model • T types, M signals, N raters • signals: S = (S1, …, SN), where Si = {s1, …, sM} • announcements: a = (a1, …, aN), where ai= {s1, …, sM} • transfers: (a) = (1 (a), …, N(a)) • announcement strategy for player i: ai = (ai1, … aiM) (a)

P roduction and Evaluation in social media