One Trillion Edges : Graph Processing at Facebook Scale

One Trillion Edges :Graph Processing at Facebook Scale Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, Sambhavi Muthukrishnan Facebook Presented by : Swapnil Gandhi 21st November 2018

Konigsberg* Bridge Problem Its negative resolution by Leonhard Euler in 1736 laid the foundations of graph theory. * Located in Kingdom of Prussia (Now in Russia)

Graphs are common • Web & Social Networks • Web graph, Citation Networks, Twitter, Facebook, Internet • Knowledge networks & relationships • Google’s Knowledge Graph, CMU’s NELL • Cybersecurity • Telecom call logs, financial transactions, Malware • Internet of Things • Transport, Power, Water networks • Bioinformatics • Gene sequencing, Gene expression networks

Graph Algorithms • Traversals: Paths & flows between different parts of the graph • Breadth First Search, Shortest path, Minimum Spanning Tree, Eulerian paths, Max-Cut • Clustering: Closeness between sets of vertices • Community detection & Evolution, Connected components, K-means clustering, Max Independent Set • Centrality: Relative importance of vertices • PageRank, Betweenness Centrality

Graphs are Central to Analytics Hyperlinks PageRank Top 20 Pages Raw Wikipedia Text Table User User Title Title Word Topic PR Body Com. Topic Com. Disc. < / > < / > < / > Term-Doc Graph Topic Model (LDA) Word Topics XML Community Detection User Community Community Topic Discussion Table Editor Graph

But Graphs can be challenging • Shared memory algorithms don’t scale! • Do not fit naturally to Hadoop/MapReduce • Multiple MR jobs (iterative MR) • Topology & Data written to HDFS each time • Tuple, rather than graph-centric, abstraction • Lot of work on parallel graph libraries for HPC • Boost Graph Library, Graph500 • Storage & compute are (loosely) coupled, not fault tolerant • But everyone does not have a supercomputer! • If in-case you do own a supercomputer, stick with HPC algorithms

PageRank using MR MapReduce : https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

PageRank using MR • MR will run for multiple iterations (Typically 30) • Mapper will • Initially, load adjacency list and initialize default PR • Subsequent iterations will load adjacency list and new PR • Emit two types of messages from Map • Reducer will • Reconstruct the adjacency list for each vertex • Update the PageRank values for the vertex based on neighbour’s PR messages • Write adjacency list and new PR values to HDFS, to be used by next Map iteration SQL v/s MapReduce : http://www.science.smith.edu/dftwiki/images/6/6a/ComparisonOfApproachesToLargeScaleDataAnalysis.pdf

Two Birds Half-caf Double Expresso Less Data movementover Network Fault Tolerance Credits : Google Images

One Stone Pregel It’s a word play on the English proverb : “Kill two birds with one stone” Credits : Google Images

Pregel To overcome these challenges, Google came up with Pregel

Valiant’s BSP P1 P2 P3 P4 Superstep 1 Computation Communication Barrier Synchronization Superstep 2 Computation “Often expensiveand should be used as sparingly as possible”

Vertex State Machine In superstep 0, every vertex is in the active state. A vertex deactivates itself by voting to halt. It can be reactivated by receiving an (external) message. Algorithm termination is based on every vertex voting to halt.

Vertex Centric Programming • Vertex Centric Programming Model ‣ Logic written from perspective on a single vertex. • Executed on all vertices. • Vertices know about ‣ Their own value(s)‣ Their outgoing edges

Worker 3 6 2 1 Superstep 0 Superstep1 6 6 2 6 6 6 6 6 Edges Superstep 2 Message Active 6 6 6 6 Superstep 3 Votedto Halt Finding Largest Value in a Graph using Pregel

Advantages Makes distributed programming easy‣ No locks, semaphores, race conditions‣ Separates computing from communication phase Vertex-level parallelization‣ Bulk message passing for efficiency Stateful (in-memory)‣ Only messages & checkpoints hit disk

Lifecycle of a Pregel Program Apache Giraph, Claudio Martella, Hadoop Summit, Amsterdam, April 2014

Applications

SSSP In the 0th superstep, only source vertex will update its value class ShortestPathVertex: public Vertex<int, int, int> { void Compute(MessageIterator* msgs) {intmindist = IsSource(vertex_id()) ? 0 : INF; for ( ; !msgs->Done(); msgs->Next()) mindist = min(mindist, msgs->Value()); if (mindist < GetValue()) { *MutableValue() = mindist;OutEdgeIteratoriter = GetOutEdgeIterator(); for ( ; !iter.Done(); iter.Next()) SendMessageTo(iter.Target(), mindist + iter.GetValue()); } VoteToHalt(); } };

SSSP (1/6) D Worker 4 5 Worker 2 2 G B 2 1 E 3 A 2 2 4 H Worker 1 C Worker 3 1 1 F Input Graph

SSSP (2/6) ∞ Worker 4 5 Worker 2 2 ∞ ∞ 2 1 ∞ 3 0 2 2 4 ∞ Worker 1 ∞ Worker 3 1 1 ∞ Superstep 0

SSSP (3/6) ∞ Worker 4 5 Worker 2 2 ∞ 1 2 1 ∞ 3 0 2 2 4 ∞ Worker 1 2 Worker 3 1 1 ∞ Superstep 1

SSSP (4/6) 3 Worker 4 5 Worker 2 2 ∞ 1 2 1 4 3 0 2 2 4 6 Worker 1 2 Worker 3 1 1 3 Superstep 2

SSSP (5/6) 3 Worker 4 5 Worker 2 2 6 1 2 1 4 3 0 2 2 4 4 Worker 1 2 Worker 3 1 1 3 Superstep 3

SSSP (6/6) 3 Worker 4 5 Worker 2 2 6 1 2 1 4 3 0 2 2 4 4 Worker 1 2 Worker 3 1 1 3 Algorithm has converged

Apache Giraph

Platform Improvements (1/2) • Efficient Memory Management (MM) • Vertex and Edge data stored using serialized byte array • Better MM -> Less GC • Support for Multi-Threading • Maximized resource utilization • Linear speed-up for CPU bound applications like K-Means Clustering

Platform Improvements (2/2) • Flexible IO Format • Reduces Pre-processing • Allows Vertex and Edge data to be loaded from different sources • Sharded Aggregator • Aggregator responsibilities are balanced across workers • Different Aggregators can be assigned to different workers.

Refer Class-room Discussion

Beyond Pregel • Master Compute • Allows centralized execution of computation • Refer Class-room Discussion • Worker Phases • Special methods which by-pass Pregel Model, but add ease of usability • Applicability is application specific • Computation Composability • Decouples Vertex and Computation • Existing Computation implementation can be decoupled for multiple applications

Superstep Splitting • Master runs same “Message Heavy” Superstep for fixed number of iterations • For an iteration: • Vertex computation invoked if vertex passes hash function H • Message sent to destination only if destination passes hash function H’ • Applicable to computation where messages are not “aggregatable”. • If they can be aggregated (commutative and associative) then stick with Combiners • Example : Friends-of-Friends Computation

Key Take-aways • Usability, Performance and scalability improvement to Apache Giraph • Code available as open-source to try out ! • Memoir detailing Facebook’s experience of using Giraph for production applications • Headline Grabber : “Scales to a Trillion Edge graph”

One Trillion Edges : Graph Processing at Facebook Scale