230 likes | 383 Vues
8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing October 14–17, 2012 Pittsburgh, Pennsylvania, United States . Robust Expert Ranking in Online Communities - Fighting Sybil Attacks. Khaled A. N. Rashed , Cristina Balasoiu, Ralf Klamma
E N D
8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing October 14–17, 2012 Pittsburgh, Pennsylvania, United States Robust Expert Ranking in Online Communities - Fighting Sybil Attacks Khaled A. N. Rashed, Cristina Balasoiu, Ralf Klamma RWTH Aachen UniversityAdvanced Community Information Systems (ACIS) {rashed|balsoiu|klamma}@dbis.rwth-aachen.de
Advanced Community Information Systems (ACIS) Web Analytics Web Engineering Requirements Engineering
Agenda • Introduction and motivation • Related work • Our Approach • Expert ranking algorithm • Robustness of the expert ranking algorithm • Evaluation • Conclusions and outlook
Introduction • Theexpert search and ranking refer to the way of finding a group of authoritative users with special skills and knowledge for a specific category. • The task is very important in online collaborative systems • Problems: openness and misbehaviour and • No attention has been made to the trust and reputation of experts • Solution: Leveraging trust
Motivation Examples Manipulating the truth for war propaganda Tidal bores presented as Indian Ocean Tsunami • Published as: 2004 Indian Ocean Tsunami • Proved to be tidal bores, a four-day-long government-sponsored tourist festival inChina • Published as: British soldiers abusing prisoners in Iraq • Proved to be fake by Brigadier Geoff Sheldon who said the vehicle featured in the photo had never been to Iraq • Expert knowledge, analysis and witnesses are needed to identify the fake!
A Case Study:Collaborative Fake Multimedia Detection System • Collaborative activities (rating, tagging and commenting) • Provide new means of search, retrieval and media authenticity evaluation • Explicit ratings and tags are used for evaluating authenticity of multimedia items • Reliability: not all of the submitted ratings are reliable • No centralized control mechanism • Vulnerability to attacks • Three types of users • Honest users • Experts • Malicious users
Research Questions and Goals • Research questions • How to measure users’ expertise in collaborative media sharing and evaluating systems? and how to rank them? • What is the implication of trust • Robustness! how to ensure robustness of the ranking algorithm • Goals • Improve multimedia evaluation • Reduce impacts of malicious users
Related Work • Probabilistic models e.g.[Tu et al.2010] • Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008] • Link-based approaches PageRank[Brein and Page 1998], HITS[Kleinberg1999] and their variations. SPEAR algorithm[Noll et al. 2009] ExpertRank [Jiao et al. 2009] • TREC enterprise track -Find the associations between candidates and documents e.g.[Balog 2006, Balog 2007] • Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009]
Our Approach • Assumptions • Expert users tend to have many authenticity ratings • Correctly evaluated media are rated by users of high expertise • Following expert users provides more benefits • Expert definition • Rates a big number of media files in an authentic way with respect to a topic and Highly trusted by his directly connected users • Should be trustable in evaluating multimedia
Expert Ranking Methods • Domain knowledge driven method • Considers tags that users assign to media files • User profile: merging tags user submitted to the media files in the system • Similarity coefficient between the candidate profile and the tags assigned to a specific resource • Used to reorder users who voted a media file according to the tag profile • Domain knowledge independent method • Use the connections between users and resources to decide on the expertise of the users • A modified version of HITS algorithm • Mutual reinforcement of users expertise and media
MHITS : Expert Ranking Algorithm • MHITS: Expert ranking algorithm in online collaborative systems • Link-based approach, based on HITS algorithm • HITS • Authorities: pages that are pointed to by good pages • Hubs: pages that points to good pages • Reinforcement between hubs and authorities • MHITS • Users act as hubs (correctly evaluated media rated by them) • Media files act as authorities • Mutual reinforcement between users and media files • Local trust values between users are assigned • Considers the rates of the users
MHITS: Expert Ranking Algorithm • one network for users and ratings • one for users only (trust network). • Trust in range [0, 1] • Ratings 0.5 for a fake vote, • 1 for an authentic vote
Robustnessofthe MHITS Algorithm • Compromising techniques • Sybil attack [Douc02], Reputation theft, Whitewashing attack, etc. • Compromising the input and the output of the algorithm • Sybil attack • Fundamental problem in online collaborative systems • A malicious user creates many fake accounts (Sybils) which all reference the user to boost his reputation (attacker’s goal is to be higher up in the rankings) • Countermeasures against Sybil attack
SumUp • Centralized approach • Aimsto aggregate votes in a Sybil resilient manner • Key idea – adaptive vote flow technique - that appropriately assigns and adjusts link capacities in the trust graph to collect the votes for an object • New: weIntegrate SumUp with the MHITS Java implementation – used own data structure based on Java Sparse Arrays • SumUp Steps • Assign the source node and number of votes per media file • Levels assignment • Pruning step • Capacity assignment • Max-flow computation – collect votes on each resource • Leverage user history to penalize adversarial nodes
Evaluation • Experimental Setup • BarabasiAlbert model for generating network • 300 users • 20 media files (10 known to be fake and 10 known to be authentic) • 800 ratings • 3000 trust edges
Evaluation • Evaluation metrics: • Precision@K • Spearman’s rank correlationcoefficient p - Spearman’s coefficient of rank correlation -1 ≤ps ≤ 1 di - is the different between the rank of xi and the rank of yi n:- the number of data points in the sample (total number of observations) • ps = - 1 or 1 high degree of correlation between x any y • Ps = 0 a lack of linear association between two variables 0 -1 +1 Perfect Positive Correlation Perfect Negative Correlation No Correlation
Experimental Results I • No Sybils • Results are compared with the ranking • of the users according to the number of • fair ratings each of them had in the system
Experimental Results II • 10% Sybils • 4 attack edges
Experimental Results III Precision@K • 10% Sybils (one group) and 8 attack edges • 20% Sybils (one group) and 24 attack edges
Further evaluation • 3%17% - Number of Sybil votes increased with respect to the total number of fair votes • expertise ranking does not change • 9 to 14 and 24 Number of attack edges was increased keeping the number of Sybil votes to 17% percent of the number of fair votes and constant number of Sybils (50) • precision does not change • 17% 50% and then to 100% the number of Sybil votes Increased keeping constant the Nr of attack edges (24) and Sybils Nr.
Conclusions and Future Work • Conclusions • Proposed an expertise ranking algorithm in collaborative systems (fake multimedia detection systems) • Leveraging trust and showed the trust implications • Combination of expert ranking and resistant to Sybils algorithms • Future Work • Applying the algorithm on real data and on different data sets • Temporal analysis –time series analysis • Integrate the domain knowledge driven method