Balancing User-Generated Content: Collusion-Resistance & Misbehaving User Detection

Collusion-Resistance Misbehaving User Detection Schemes Speaker: Jing-Kai Lou

Outline • Introduction • What’s the problem • Does it matter • Previous work: What have I done … • Community-based scheme • Current Analysis: What am I doing … • HITS • Random walk scheme

The Rise of User Generated Content • Most of the fastest-growing sites on the internet now are based on user-generated content (UGC). Customer Reviews Increase Web Sales --- eMarketer

Inappropriate UGC • The misbehaving users • post the inappropriate UGC • Hiring lots of official moderators • is the typical solution • But, such high labor cost is a great burden to the service provider • There is another choice …

Social Moderation System X • A user-assist Moderation • Every user is a reviewer O X X Official moderator inspects what you see ? !? ?? You report what you see while viewing Blogger Album Video

Social Moderation Effect • Advantages of social moderation system: • Fewer official moderators • Detecting inappropriate content quickly • The number of the reports is still large.1%uploading photos in Flickr are problematic, there are still about 43,200 reports each day • An automation scheme to filter the reports

Automated Filter for Reports • Sorting the reports by their number of accusations 37 3 47 These photos are reported more than (N =20) times These photos are reported no more than (N =20) times

However, the collusion exists…

Not All Users Are Trustable • While most users report responsibly, colluders report fake results to gain some benefits

The Objective • To develop a collusion-resistant scheme • CAN automatically infers whether the accusations are fair or malicious. The scheme, therefore, distinguish misbehaving users from victims.

Our Work: Graph Theory Approach • Using the report (accusation) relation only • Previous work: Community-based Scheme • Submitted to 3rd ACM workshop on Scalable Trust Computing (STC 2008) • Extended work: • Propose new schemes • Analyzing new schemes…

COMMUNITY-BASED SCHEME

Community-based Scheme • Achieving accuracy rate higher than 90% • Preventing at least 90% victims from collusion attack

Idea of Community-based Scheme • Accusation Relation: Accusing Graph:

Ideal Patterns Colluder Normal user Victim Misbehaving user

Accusing Community • Users with similar accusing tend to bein the same community Inter-community edge

Designing Features for Each User • To find accusations NOT from colluders • Base on the communities, we design features • Incoming Accusation, IA(k) = 2, • Outgoing Accusation, OA(k) = 5 k

Community-based Algorithm • Partitioning accusing graph into communities. • Computing the feature pair (IA, OA) of each user • Clustering based on their (IA, OA) pairs, and label users in the cluster with large (IA, OA) as misbehaving users.

Evaluation Metric • What we care is, False Negative • Misidentifying victims as misbehaving users • Collusion Resistance

Effect of #(Misbehaving users) Our Method Count-based Method

Effect of #(Colluders) Our Method Count-based Method

Effect of Accusation Density Our Method Count-based Method

Weakness of Community-based scheme • In our simulation, the colluders only accuse the victims. • Realistically, the colluders sometimes may also vote some misbehaving users. • We shall consider smart colluder

Smart Colluder Behavior • Behavior :=probability for colluder to vote misbehaving users, ranges from 0 to 100. Normal user Naïve Colluder Smart Colluder Behavior 0 100

HITS, HYPERLINK-INDUCED TOPIC SEARCH

Inspiration • A link analysisalgorithm that rates Web pages, developed by Jon Kleinberg. • It determines two values for a page: • its authority, which estimates the value of the content of the page, • and its hub value, which estimates the value of its links to other pages.

Ideal • Authority Victim • Hub value Colluder • For example, • Number of User = 150 • Misbehaving User Ratio = 10%, i.e., 15 • Colluder Ratio = 20%, i.e., 30 • Behavior = 20%

When Behavior is increasing • Parameter: • Number of User = 150 • Misbehaving User Ratio = 10%, i.e., 15 • Colluder Ratio = 20%, i.e., 30 • Behavior = 50%

RANDOM WALK SCHEME

Main Idea • Focusing on content accused by many reviewers • Creating undirected graph C to describe them and their relation • Shaping C, (named it as D) to satisfy the Goal • Goal:Putting many people walking several steps on D, then most of people would stay on “victims” finally

Co-Voter Graph, C • Define a co-voter graph C(V, E) to describe the relation between all accused • V(G): accused • E(G): • if the intersection of accusers against accused i and j (vertex i and j), then (i, j) in E(G) • weight, w(i. j) = #(intersection of accusers)

A snap shot of co-voter graph 1, 12, 13, 14 1, 2, 3, 4, 5, 6, 7, 8 5,6,7,8 B A F C D E 1, 2, 4, 8, 9, 10 5, 6, 7 5, 7,8

Making Ideal Tendency (Be Directed) Key Node Key Node M V Strong Weak M’ V’ GOAL: For M, 2 > 1 For V, 3 > 2

Goal 1: Intersection Ratio Prob. to V M V Prob. to M M’

GOAL 2: Alpha of Target • Alpha(M) < Alpha(V), hopefully Prob. to M = Alpha(M) M b Prob. to V = Alpha(V) V

What should be Alpha? • [Version N(eighborhood)]: Alpha(T) := the number of co-voters between b and all its neighborsColluder tend to share more co-voters with his collusion group … • [Version H(ub)]: Alpha(T) := Sum(hub score of T’s voter)

Weight Formula Options • Directed weight formula: w(a, b) =Alpha(b) * |a intersect b| / |a union b| • Then, we set the node leaving prob. by normalizing outgoing weight A C Pr(X  A) = .4 Pr(X  B) = .2 Pr(X  C) = .4 0.8 B X 0.8 0.4

Evaluation • Parameter: • Number of User = 250 • Misbehaving User Ratio = 10%, i.e., 25 • Colluder Ratio = 20%, i.e., 50 • Behavior = 50%

Evaluation

Conclusion • Any new factor we shall consider? • Any idea to improve the random walk scheme, or HITS Scheme? • Any NEW idea?

THANKS FOR YOUR LISTENING!

Balancing User-Generated Content: Collusion-Resistance & Misbehaving User Detection