1 / 50

Spring 2015 Implementing Parallel Graph Algorithms Lecture 2: Introduction

This lecture introduces the concept of parallel graph algorithms and their implementation considerations for sequential graph programs. It covers operator formulation, scheduling, and delta computation, enabling multiple implementations suitable for different inputs and architectures.

lavernm
Télécharger la présentation

Spring 2015 Implementing Parallel Graph Algorithms Lecture 2: Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spring 2015Implementing ParallelGraph Algorithms Lecture 2: Introduction Roman Manevich Ben-Gurion University

  2. Graph Algorithms are Ubiquitous Computer Graphics Computational biology Social Networks

  3. Agenda Operator formulation of graph algorithms Implementation considerations for sequential graph programs Optimistic parallelization of graph algorithms Introduction to the Galois system

  4. Operator formulation of graph algorithms

  5. Main Idea • Define high-level abstraction of graph algorithms in terms of • Operator • Schedule • Delta • Given a new algorithm describe it in terms of composition of these elements • Enables many implementations • Find one suitable for typical input and architecture

  6. Example: Single-Source Shortest-Path S 5 2 A B A 2 1 7 C C 3 4 3 12 D E 2 2 F 9 1 G if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC • Problem Formulation • Compute shortest distancefrom source node Sto every other node • Many algorithms • Bellman-Ford (1957) • Dijkstra (1959) • Chaotic relaxation (Miranker 1969) • Delta-stepping (Meyer et al. 1998) • Common structure • Each node has label distwith knownshortest distance from S • Key operation • relax-edge(u,v)

  7. Dijkstra’s Algorithm <B,5> <C,3> <B,5> <E,6> <B,5> <D,7> S 5 2 A B 5 3 1 7 C 3 4 D E 7 2 2 6 F 9 1 G Scheduling of relaxations: • Use priority queueof nodes, ordered by label dist • Iterate over nodes u in priority order • On each step: relax all neighbors v of u • Apply relax-edgeto all (u,v)

  8. Chaotic Relaxation S 5 2 • Scheduling of relaxations: • Use unordered set of edges • Iterate over edges (u,v) in any order • On each step: • Apply relax-edge to edge (u,v) A B 5 1 7 C 3 4 12 D E 2 2 F 9 1 G (C,D) (B,C) (S,A) (C,E)

  9. Q = PQueue[Node] • Q.enqueue(S) • while Q ≠ ∅ { • u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) • } Algorithms as Scheduled Operators • W = Set[Edge] • W ∪= (S,y) : y ∈ Nbrs(S) • while W ≠ ∅ { • (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y ∈ Nbrs(v) W.add(v,y) • } • Graph Algorithm = Operator(s) + Schedule • Dijkstra-style • Chaotic-Relaxation

  10. Deconstructing Schedules Graph Algorithm How it should be done What should be done Operators Schedule Operator Delta Unordered/Ordered algorithms : activity • “TAO of parallelism” PLDI’11 Order activity processing Identify new activities Static Schedule Dynamic Schedule Priority in work queue Code structure(loops)

  11. Q = PQueue[Node] • Q.enqueue(S) • while Q ≠ ∅ { • u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) • } • W = Set[Edge] • W ∪= (S,y) : y ∈ Nbrs(S) • while W ≠ ∅ { • (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y ∈ Nbrs(v) W.add(v,y) • } Example • Graph • Algorithm • = • Operators • + • Schedule • Order activity processing • Identify new activities • Static • Dynamic • Chaotic-Relaxation • Dijkstra-style

  12. SSSP in Elixir Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] Graph type relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ] Operator Fixpoint Statement sssp = iterate relax ≫ schedule

  13. Operators Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ] Redex pattern Guard Update sssp = iterate relax ≫ schedule ad bd ad ad+w w w a b a b if bd > ad + w

  14. Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Scheduling expression Apply operator until fixpoint

  15. Scheduling Examples q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } } } Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫unroll 2 ≫approx metric ad Dijkstra-style metric ad ≫group b

  16. Implementation considerations for sequential graph programs

  17. Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule

  18. Finding the Operator delta

  19. Problem Statement • Many graph programs have the formuntil no change do { apply operator} • Naïve implementation: keep looking for places where operator can be applied to make a change • Problem: too slow • Incremental implementation: after applying an operator, find smallest set of future active elements and schedule them (add to worklist)

  20. Identifying the Delta of an Operator ? b relax1 ? a

  21. Delta Inference Example c relax2 w2 a b w1 SMT Solver relax1 assume(da + w1< db) assume¬(dc + w2 < db) db_post =da + w1 assert¬(dc + w2 < db_post) SMT Solver (c,b) does not become active Query Program

  22. Delta Inference Example – Active Apply relax on all outgoing edges (b,c) such that: dc > db +w2 and c ≄ a relax1 relax2 a b c w1 w2 SMT Solver assume(da + w1< db) assume¬(db+ w2 < dc) db_post =da + w1 assert¬(db_post+ w2< dc) SMT Solver Query Program

  23. Influence Patterns d c a b=c a=d b a=c b a=c b=d a b=d d a=d b=c c

  24. Implementing the operator

  25. Example: Triangle Counting • How many triangles exist in a graph • Or for each node • Useful for estimating the community structure of a network

  26. Triangles Pseudo-code • … • for a : nodesdo • for b : nodesdo • for c : nodesdo • … • if edges(a,b) • if edges(b,c) • if edges(c,a) • if a < b • if b < c • if a < c • triangles++ • fi • …

  27. • ≺ Example: Triangles • Iterators • Graph Conditions • Scalar Conditions • for a : nodesdo • for b : nodesdo • for c : nodesdo • if edges(a,b) • if edges(b,c) • if edges(c,a) • if a < b • if b < c • if a < c • triangles++ • fi • …

  28. • ≺ Triangles: Reordering • Iterators • Graph Conditions • Scalar Conditions • for a : nodesdo • for b : nodesdo • if edges(a,b) • if a < b • for c : nodesdo • if edges(b,c) • if a < c • if b < c • if edges(c,a) • triangles++ • fi • …

  29. • ≺ • for a : nodesdo • for x : nodesdo Triangles: Implementation Selection • Iterators • ifedges(x,y) • for b : Succ(a)do • Graph Conditions • ⇩ • if a < b • Scalar Conditions • for x : Succ(y)do • for c : Succ(b)do • if a < c • for a : nodesdo • if b < c • Reordering+ • ImplementationSelection • for b : nodesdo • if edges(c,a) • if edges(a,b) • triangles++ • if a < b • for c : nodesdo • fi • if edges(b,c) • … • if a < c • if b < c • if edges(c,a) • Tile: • triangles++ • fi • …

  30. Optimistic parallelization of graph programs

  31. Parallelism is Everywhere Texas Advanced Computing Center Laptops Cell-phones

  32. Example: Boruvka’s algorithms for MST

  33. Minimum Spanning Tree Problem 7 1 6 c d e f 2 4 4 3 a b g 5

  34. Minimum Spanning Tree Problem 7 1 6 c d e f 2 4 4 3 a b g 5

  35. Boruvka’sMinimum Spanning Tree Algorithm 7 1 6 c d e f lt 2 4 4 3 1 6 a b g d e f 5 7 4 4 a,c b g 3 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  36. Parallelism in Boruvka 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  37. Non-conflicting Iterations 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  38. Non-conflicting Iterations 1 6 f,g d e 7 4 a,c b 3 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  39. Conflicting Iterations 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  40. Optimistic parallelization of graph algorithms

  41. How to parallelize graph algorithms The TAO of Parallelism in Graph Algorithms / PLDI 2011 Optimistic parallelization Implemented by the Galois system

  42. Operator Formulation of Algorithms • Active element • Site where computation is needed • Operator • Computation at active element • Activity: application of operator to active element • Neighborhood • Set of nodes/edges read/written by activity • Distinct usually from neighbors in graph • Ordering : scheduling constraints on execution order of activities • Unordered algorithms: no semantic constraints but performance may depend on schedule • Ordered algorithms: problem-dependent order • Amorphous data-parallelism • Multiple active elements can be processed in parallel subject to neighborhood and ordering constraints What is that?Who implements it? : active node : neighborhood Parallel program = Operator + Schedule + Parallel data structure

  43. Optimistic Parallelization in Galois i2 i1 i3 • Programming model • Client code has sequential semantics • Library of concurrent data structures • Parallel execution model • Activities executed speculatively • Runtime conflict detection • Each node/edge has associated exclusive lock • Graph operations acquire locks on read/written nodes/edges • Lock owned by another thread  conflict  iteration rolled back • All locks released at the end • Runtime book-keeping(source of overhead) • Locking • Undo actions

  44. Avoiding rollbacks

  45. Cautious Operators • When an iteration aborts before completing its work we need to undo all of its changes • Log each change to the graph and upon abort apply reverse actions in reverse order • Expensive to maintain • Not supported by Galois systems for C++ • How can we avoid maintaining rollback data? • An operator is cautious if it never performs changes before acquiring all locks • In this case upon abort there are no changes to be undone • Can ensure operator is cautious by adding code to acquire locks before making any changes

  46. Failsafe Points foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } foreach (Node a : wl) { … … } Lockset Grows Failsafe Lockset Stable … Program point Pis failsafe if: For every future program point Q – the locks set in Q is already contained in the locks set of P: Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

  47. Is this Code Cautious? foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } No Lockset Grows Failsafe Lockset Stable a … lt

  48. Rewrite as Cautious Operator foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.neighbors(lt); g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Lockset Grows Failsafe Lockset Stable a … lt

  49. So far • Operator formulation of graph algorithms • Implementation considerations for sequential graph programs • Optimistic parallelization of graph algorithms • Introduction to the Galois system

  50. Next steps • Divide into groups • Algorithm proposal • Due date: 15/4 • Phrase algorithm in terms of operator formulation • Define delta if necessary • Submit proposal with description of algorithm + pseudo-code • LaTeX template will be on web-site soon • Lecture on 15/4 on implementing your algorithm via Galois

More Related