Distributed Systems CS 15-440

Distributed SystemsCS 15-440 Programming Models- Part IV Lecture 17, Oct 27, 2014 Mohammad Hammoud

Today… • Last Session: • Programming Models – Part III: MapReduce • Today’s Session: • Programming Models – Part IV: Pregel • Announcements: • Project 3 will be posted today. It is due on Wednesday Nov 12, 2014 by midnight • PS4 will be posted today. It is due on Saturday Nov 15, 2014 by midnight

Objectives Discussion on Programming Models MapReduce, Pregel and GraphLab MapReduce, Pregel and GraphLab Message Passing Interface (MPI) Types of Parallel Programs Traditional Models of parallel programming Parallel computer architectures Why parallelizing our programs? Cont’d Last 3 Sessions

The Pregel Analytics Engine Pregel Motivation & Definition The Computation & Programming Models Input and Output Architecture & Execution Flow Fault-Tolerance

Motivation for Pregel • How to implement algorithms to process Big Graphs? • Create a custom distributed infrastructure for each new algorithm • Rely on existing distributed analytics engines like MapReduce • Use a single-computer graph algorithm library like BGL, LEDA, NetworkX etc. • Use a parallel graph processing system like Parallel BGL or CGMGraph Difficult! Inefficient and Cumbersome! Big Graphs might be too large to fit on a single machine! Not suited for Large-Scale Distributed Systems!

What is Pregel? • Pregel is a large-scale graph-parallel distributed analytics engine • Some Characteristics: • In-Memory (opposite to MapReduce) • High scalability • Automatic fault-tolerance • Flexibility in expressing graph algorithms • Message-Passing programming model • Tree-style, master-slave architecture • Synchronous • Pregel is inspired by Valiant’s Bulk Synchronous Parallel (BSP)model

The BSP Model Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier Super-Step 1 Super-Step 2 Super-Step 3

Entities and Super-Steps • The computation is described in terms of vertices, edges and a sequence of super-steps • You give Pregel a directed graph consisting of vertices and edges • Each vertex is associated with a modifiable user-defined value • Each edge is associated with a source vertex, value and a destination vertex • During a super-step: • A user-defined function F is executed at each vertex V • F can read messages sent to V in superset S – 1 and send messages to other vertices that will be received at superset S + 1 • F can modify the state of V and its outgoing edges • F can alter the topology of the graph

Topology Mutations • The graph structure can be modified during any super-step • Vertices and edges can be added or deleted • Mutating graphs can create conflicting requests where multiple vertices at a super-step might try to alter the same edge/vertex • Conflicts are avoided using partial ordering and handlers • Partial orderings: • Edges are removed before vertices • Vertices are added before edges • Mutations performed at super-step S are only effective at super-step S + 1 • All mutations precede calls to actual computations • Handlers: • Among multiple conflicting requests, one request is selected arbitrarily

Algorithm Termination • Algorithm termination is based on every vertex voting to halt • In super-step 0, every vertex is active • All active vertices participate in the computation of any given super-step • A vertex deactivates itself by voting to halt and enters an inactive state • A vertex can return to active state if it receives an external message • A Pregel program terminates when all vertices are simultaneously inactive and there are no messages in transit Vote to Halt Message Received Active Inactive Vertex State Machine

Finding the Max Value in a Graph Blue Arrows are messages S: 3 3 6 6 6 6 6 6 6 2 2 2 1 6 6 1 6 6 2 6 Blue vertices have voted to halt S + 1: 6 6 6 S + 2: 6 S + 3:

The Programming Model • Pregel adopts the message-passing programming model • Messages can be passed from any vertex to any other vertex in the graph • Any number of messages can be passed • The message order is not guaranteed • Messages will not be duplicated • Combinerscan be used to reduce the number of messages passed between super-steps • Aggregatorsare available for reduction operations (e.g., sum, min, and max)

The Pregel API in C++ • A Pregel program is written by sub-classing the Vertex class: To define the types for vertices, edges and messages template <typename VertexValue, typename EdgeValue, typename MessageValue> class Vertex { public: virtualvoid Compute(MessageIterator* msgs) = 0; const string& vertex_id() const; int64 superstep() const; const VertexValue& GetValue(); VertexValue* MutableValue(); OutEdgeIterator GetOutEdgeIterator(); void SendMessageTo(const string& dest_vertex, const MessageValue& message); void VoteToHalt(); }; Override the compute function to define the computation at each superstep To get the value of the current vertex To modify the value of the vertex To pass messages to other vertices

Pregel Code for Finding the Max Value Class MaxFindVertex : public Vertex<double, void, double> { public: virtual void Compute(MessageIterator* msgs) { int currMax = GetValue(); SendMessageToAllNeighbors(currMax); for ( ; !msgs->Done(); msgs->Next()) { if (msgs->Value() > currMax) currMax = msgs->Value(); } if (currMax > GetValue()) *MutableValue() = currMax; else VoteToHalt(); } };

Input, Graph Flow and Output • The input graph in Pregel is stored in a distributed storage layer (e.g., GFS or Bigtable) • The input graph is divided into partitions consisting of vertices and outgoing edges • Default partitioning function is hash(ID) mod N, where N is the # of partitions • Partitions are stored at node memories for the duration of computations (hence, an in-memory model & not a disk-based one) • Outputs in Pregel are typically graphs isomorphic (or mutated) to input graphs • Yet, outputs can be also aggregated statistics mined from input graphs (depends on the graph algorithms)

The Architectural Model • Pregel assumes a tree-style network topology and a master-slave architecture Core Switch Rack Switch Rack Switch Master Worker4 Worker3 Worker1 Worker5 Worker2 Push work (i.e., partitions) to all workers Send Completion Signals When the master receives the completion signal from every worker in super-step S, it starts super-step S + 1

The Execution Flow • Steps of Program Execution in Pregel: • Copies of the program code are distributed across all machines 1.1 One copy is designated as the master and every other copy is deemed as a worker/slave • The master partitions the graph and assigns workers partition(s), along with portions of input “graph data” • Every worker executes the user-defined function on each vertex • Workers can communicate among each others

The Execution Flow • Steps of Program Execution in Pregel: • The master coordinates the execution of super-steps • The master calculates the number of inactive vertices after each super-step and signals workers to terminate if all vertices are inactive (and no messages are in transit) • Each worker may be instructed to save its portion of the graph

Fault Tolerance in Pregel • Fault-tolerance is achieved through checkpointing • At the start of every super-step the master may instruct the workers to save the states of their partitions in a stable storage • Master uses “ping” messages to detect worker failures • If a worker fails, the master re-assigns corresponding vertices and input graph data to another available worker, and restarts the super-step • The available worker re-loads the partition state of the failed worker from the most recent available checkpoint

How Does Pregel Compare to MapReduce?

Pregel versus MapReduce

Next Class GraphLab

Back-up Slides

PageRank • PageRank is a link analysis algorithm • The rank value indicates an importance of a particular web page • A hyperlink to a page counts as a vote of support • A page that is linked to by many pages with high PageRank receives a high rank itself • A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank

PageRank (Cont’d) • Iterate: • Where: • αis the random reset probability • L[j] is the number of links on page j 1 2 3 5 4 6

Distributed Systems CS 15-440