Expertise Network Analysis in Online Communities

Expertise Networks in Online Communities: Structure andAlgorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.

Outline • Automatically identify users with expertise. • Analysis of the java forum • Test various network based ranking algorithms such as HITS and PageRank • Use simulations rules to evaluate how other alogorithms perform on Java Forum. • Evaluate performance in communities with different characteristics.

Expertise Finder – Systems that help to find others with appropriate expertise to answer a question. Current Expertise finders – Modern Information retrieval techniques. Represent as term vector, match expertise queries using standard IR techniques. Problem : Reflect if a person knows about a topic but does not distinguish person’s relative expertise levels. Solution – Use network based ranking algorithm + content analysis. Introduction

Expertise Network • Usually have discussion thread structure • Not a network focused on social relationships • User replies because of interest in content. • CEN – Community Expertise Network – Distribution of expertise along with network responses • Structural Prestige – Closely related. Receiving more positive choices is prestigious.

Empirical Study – Java Forum • People come to ask questions. • 87 sub forums with large diversity of users. • 333,314 messages in 49,888 threads. • 13,739 nodes and 55,761 edges. • Used human raters and selected 135 users – omitting users postings less than 10 times.

Characterizing the Network • Bow-tie Structure analysis • Degree Distribution – To capture Level of interaction. • Scale Free - Highly uneven distribution of participation. • Degree Correlations • Indegree – how many people a given user helps. • Does not provide users’ own tendency to provide help- Eg. Only reply to newbies or talk to similar expertise level people. • For Each asker-replier count indegree of replier vs asker.

Expertise Ranking Algorithms • Simple Statistical Measure • Answers lot = knows the topic well. • Spammers – inflammatory or disruptive posts. • Handling Problem • Users’ relevance feedback. • AnswerNum – No of questions answered. • Also count no of users a user helped. • Shows broader or greater expertise.

Z- Score Measures • Replying many = High Expertise • Asking many = lacks expertise on topics • Z – Score Combines both q + a. • Measure how different from a random user • Post answers with p = 0.5 so n*p =n/2 replies • Std Dev. Sqrt ( n*p*(1-p) = Sqrt(n) / 2 • Asks and answers ~= 0, Answer more +

Expertise Rank Algorithm • Problem in Counting no posts • user answered 100 newbie questions ranked equally expert as 100 advanced users’ ques. • Adopt method similar to PageRank. • Intuition B<-A and C<- B .C’s Expertise boosted. • C(Ui) – Total no of users helping U1 • d – Damping factor was set to 0.85 • Could also be weighted including WiA – No times i was helped by A • In this study, weighting does not improve the accuracy.

Evaluations • 2 raters- Java Programming experts. • Five Levels of Expertise Rating.

Statistical Metrics • Frequently used correlation measures • Spearmans rho : Does not handle weak ordering(i.e. Multiple items in ranking such that neither item is preferred over the other). • Kendall’s Tau : Gives equal weight to any interchange of equal distance, no matter where it occurs. Eg between 1 & 2, 101 &102 • TopK :Calculates Kendall’s Tau only for highest 20 ranks

Performance of Various Algorithms in different statistical metrics.

Simulations • The Need for it • Understanding the human dynamics that shape an online community. • This will help select appropriate algorithm for communities where dynamics different from the Java Forum. • 2 Models - Best Preferred and Just Better Network

Best Preferred Network • Many experts answered others’ questions and seldom asked questions. • Very much similar to the Java Forum. P of replying increases exponentially with expertise level difference between 2 users

Just Better Network • Eg. Within an Organisation, experts may be under time constraints. Choose to answer only questions makes best use of their expertise. • Users having slightly better level of expertise answers. • U’s probability of answering a’s question

Contd… • Users make best use of their time • They are more selective in answering. • ExpertiseRank propagates expertise score from newbies to intermediate users who answer their question. • From them to experts. • In General ExpertiseRank outperforms others.

Network generated from both the models.

Summary & Future Work • Structural Information can be used to evaluate expertise network in online setting. • Relative expertise could be found using social network-based algorithms. • These algorithms did nearly as well as human raters. • In Future, Combine content information – to differentiate specific knowledge and structural information.

THANK YOU !!!

Expertise Network Analysis in Online Communities