Vote Calibration in Community Question-Answering Systems

Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), AnirbanDasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google) SIGIR 2012 This work was conducted when all authors were affiliated with Yahoo!

Why I Present This Paper? • Vote bias exists in many social media platforms • This paper solves a problem in a relatively old context “CQA” from a new perspective, “crowd sourcing quality content identification”

Outline • Motivation • Related Work • Data Set • Vote Calibration Model • Exploratory Analysis • Features • Experimental Results • Conclusion

Community Question Answering Crowd sourced alternative to search engines for providing information

Community Question Answering Commercial spam: mostly can be tackled by conventional machine learning Low quality content: difficult for machines to detect! Crowdsourcing quality content identification

Voting Mechanism • Content quality • User expertise

Vote in Yahoo! Answers • Asker vote for the best answer • Asker does not vote for the best answer within certain period, other users in the community vote • Thumb-up or thumb-down votes on each individual answer • However… Are users’ votes always un-biased?

Potential Bias • Vote more positively for friends’ answers • Use votes to show appreciation instead of identifying high quality content • Game the system to obtain high status, multiple accounts, vote for one another • Questions about opinions, vote for answer that share same opinions • …

Potential Bias • Trained human editors to judge answers based on a set of well-defined guidelines • Raw user votes have low correlation with editorial judgment

Motivation • Propose the problem of vote calibration in CQA systems • Based on exploratory data analysis, identify a variety of potential factors that bias the votes • Develop a model for vote calibration based on supervised learning, content-agnostic approach

Related Work • Predicting user-voted best answer • Assumption: readily available user-voted best answer are ground truth • Predicting editorial judgments • User votes are used as features, calibration of each individual vote has not be studied • Content-agnostic user expertise estimation

Dataset • Editorial data • Sample questions and answers from Yahoo! Answers • Give quality grade to the answer according to pre-determined set of editorial guideline, excellent, good, fair, bad • 21,525 editorial judged answers on 7,372 questions

Dataset • Distribution of editorial grades for best answers are not very different from non-best answers. Low correlation between users’ best-answer votes and answer quality • Significant percentage (>70%) of best answers are not even good • Many non-best answers are actually good or excellent

Dataset • Numeric quality scores, excellent=1,good=0.5,fair=0,bad=-0.5 • Voting data, 1.3M questions, 7.0M answers, 0.5M asker best answer votes, 2.1M community best answer votes, 9.1M thumb up/down votes

Vote Calibration Model

Vote Calibration Model • Three types of votes • Asker votes: best answer votes by asker • +1 for best answer • -1 for other answers • CBA votes: community best answer votes • +1 from the voter that votes for best answer • -1 from the voter for other answers • Thumb votes: thumb-up and thumb down • +1 for thumb up • -1 for thumb down

Average Vote of An Answer Pseudo votes, prior Calibrated type-t votes

Average Vote of An Answerer/User

Quality Prediction Function Calibrated vote aggregation model: Bias term Answer level User level Quality prediction: weighted sum of answer-level and user-level average vote values of all types on an answer

Training Algorithm • Determine model parameters by minimizing the following loss function • Using gradient descent to determine model parameters

Self Voting • Self votes contribute to 33% of total CBA votes • Users who cast at least 20 votes, percentage of self votes goes above 40%

Vote Spread and Reciprocity

Interaction Bias • Chi-squared statistic and randomized test show past interaction could be useful features for vote calibration

Feature • Voter features

Feature • Relation feature

Feature Transformation • Each for the features C that are counts, consider log(1+C) as an additional feature • For ratio features R, include a quadratic term R2

Experimental Results • User-level expert ranking • How well we rank users based on the predicted user-level scores • Answer ranking • How well we rank answers based on the predicted answer-level scores

Experimental Results

Comparison of Calibration Models

Impact on Heavy Users

Conclusion • Introduce vote calibration problem to CQA • Propose a set of features to capture bias by analyzing potential bias in users’ voting behavior • Supervised calibrated models are better than non-calibrated versions

Thanks • Q & A

Vote Calibration in Community Question-Answering Systems