Counting Triangles and the Curse of the Last Reducer

Counting Triangles and the Curse of the Last Reducer SiddharthSuri, Sergei VassilvitskiiYahoo! Research Presentation Nikos Stasinopoulos

The Social Network

Using Clustering Coefficient to identify cliques • In SN, nodes tend to cluster together.(Holland and Leinhardt, 1971;Watts and Strogatz, 1998) • Assume undirected graph G = (V, E), Γ(v) is v’s neighborhood Clustering Coefficient is the fraction of v’s neighbors which are neighbors themselves

Calculating CC by counting edges cc (A) = 1 cc (B) = 1 cc (C) = 1/3 cc (D) = N/A

Calculating CC by counting triangles cc (A) = 1 cc (B) = 1 cc (C) = 1/3 cc (D) = N/A Again,…

How to count triangles – Naiveapproach A Sequential Node Algorithm Pivot around each node Examine every pair of neighbors Count each triangle 6 times Quadratic,even for one high degree node Running Time :

Improve upon the NodeIterator The improved version Pivot around low-degree nodes Result: Count each a triangle only once and, more importantly, consider far fewer 2-paths which is optimal [Shank, Th., 2007] Running Time :

Why parallelize algorithms? • Graph Data Structure doesn’t fit in memory of a single machine • A sample Twitter graph has 42 million nodes and 2.4 billion edges ~ 4.5GB of compressed data. • When inside algorithm, computation of 2-paths explodes memory demand to petabytes

Map Reduce Framework

Advantages of MapReduce • Runs on commodity hardware • Non-critical failures • Widely used at: Yahoo!, Google, Facebook, MS (12/10) • Provided by cloud services such as AmazonWS • Open source

MR -NodeIterator Round 1: Generate all possible 2-paths starting from each node Round 2: Check if 2-paths and starting node form a triangle

MR -NodeIterator • Round 1: • Map 1: For each emit to reducer 1 • Reduce 1: Input • Output: all possible whereExample: (A,B);C - (A, D);C - (B, D);C Split input to reducers Formulate 2-paths Symbol denotes existence of neighbor edge • Round 2: • Map 2: Send and to reducer2 • Reduce 2: Input • Output: if exists, then count • Example: (A,B);C,

MR-NodeIterator++ Pivot around the node with lower degree Input to Red1 is Output is Reducer2 input contains entire edge list and is Result: Count each triangle only once

Data Skew – The Curse In context, there exist nodes with a high degree. Reducer with node @BarackObama (~10M followers) has to check 100 Trillion 2-paths using the naive approach. Natural Graphs commonly follow power law degree! The curse of 99% Complete

Lifting the Curse Splitting nodes to low-, and high-degree (ieNodeIterator++) • |L| is at most n and each low–degree node generates paths • |H| is at most and each high–degree node generates paths • Finally, total work is

Tackling the Curse – Graph Partitioning • The authors suggest partitioning the Graph. • is the induced subgraph • A contains 3/ρvertices and edges • Every triangle appears at least at one subgraph, possibly in more, so weights are introduced to scale the count

In how many subgraphs a triangle appears? • Assume G is divided in ρ=4 • Case 1:Triangle’s vertices lie in distinct subsets , appears once • Case 2:Two vertices in the same subset,triangle appears ρ-2 times • Case 3:All three nodes in one subset,see line #15

MR-GraphPartition Hash function distributes vertices to buckets Total size of Map output is Input size is Case 1 Case 3 Case 2 Scale #appearances

The partitioning (ρ) parameter • Total size of Map output is • Input size for each Reducer is • This calls for a tradeoff.Increasing total disk memory for the Mappers,greatly decreases RAM req. for Reducers. • Again, total work is , distributed to Reducers.

Results • Completion time distributes “normally” across the runtime spectrum

ρ Tradeoff

Contributions • Introduces MapReduce on counting triangles, even for the naive approach • Provides Graph Partition MR algorithm, extendable to other than triangles subgraphs • Implements some of Schank’s work in the context of Social Networks • Explores challenges in real-world data (data skew) • Results are exact, not approximations

Thank you!

Counting Triangles and the Curse of the Last Reducer

Counting Triangles and the Curse of the Last Reducer

Presentation Transcript

The curse of Tutankhamen

THE CURSE OF THE CROSS

Fast counting of triangles in large networks without counting: Algorithms and laws

The Blessing and the Curse

“The Curse of the Old man”

The Titans Curse

The Titans Curse

The Curse of The Black Sox

Curse of the Pharaoh

THe MAth CuRsE

The Curse Of The Cowboy

The curse

Curse of the Expert

The Last Apprentice II: Curse of the Bane and III: Night of the Soul Stealer

THE CURSE OF THE FALL

Curse of the Explorers

The Curse of the Scottish Play

The Resource Curse

The Curse of Fatherlessness

The Curse of Tippecanoe

The Curse Of Jericho

The Curse of Dimensionality