1 / 87

Joseph Gonzalez Joint work with

@. The Next Generation of the GraphLab Abstraction. Joseph Gonzalez Joint work with. Yucheng Low. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Guy Blelloch. Joe Hellerstein. David O’Hallaron. Alex Smola. How will we design and implement parallel learning systems?.

indra
Télécharger la présentation

Joseph Gonzalez Joint work with

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. @ The Next Generation of the GraphLab Abstraction. Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron Alex Smola

  2. How will wedesign and implementparallel learning systems?

  3. ... a popular answer: Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions

  4. MapReduce – Map Phase 4 2 . 3 2 1 . 3 2 5 . 8 CPU 1 1 2 . 9 CPU 2 CPU 3 CPU 4 Embarrassingly Parallel independent computation No Communication needed

  5. MapReduce – Map Phase 8 4 . 3 1 8 . 4 8 4 . 4 CPU 1 2 4 . 1 CPU 2 CPU 3 CPU 4 1 2 . 9 4 2 . 3 2 1 . 3 2 5 . 8 Image Features

  6. MapReduce – Map Phase 6 7 . 5 1 4 . 9 3 4 . 3 CPU 1 1 7 . 5 CPU 2 CPU 3 CPU 4 8 4 . 3 1 8 . 4 8 4 . 4 1 2 . 9 2 4 . 1 4 2 . 3 2 1 . 3 2 5 . 8 Embarrassingly Parallel independent computation No Communication needed

  7. MapReduce – Reduce Phase Attractive Face Statistics Ugly Face Statistics 17 26 . 31 22 26 . 26 CPU 1 CPU 2 1 2 . 9 2 4 . 1 1 7 . 5 4 2 . 3 8 4 . 3 6 7 . 5 2 1 . 3 1 8 . 4 1 4 . 9 2 5 . 8 8 4 . 4 3 4 . 3 Image Features

  8. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Is there more to Machine Learning ? Map Reduce Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  9. Concrete Example Label Propagation

  10. Label Propagation Algorithm • Social Arithmetic: • Recurrence Algorithm: • iterate until convergence • Parallelism: • Compute all Likes[i] in parallel Sue Ann 50% What I list on my profile 40% Sue Ann Likes 10% Carlos Like 80% Cameras 20% Biking 40% + I Like: 60% Cameras, 40% Biking Profile 50% 50% Cameras 50% Biking Me Carlos 30% Cameras 70% Biking 10%

  11. Properties of Graph Parallel Algorithms Dependency Graph Factored Computation Iterative Computation What I Like What My Friends Like

  12. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Map Reduce? ? Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  13. Why not use Map-Reducefor Graph Parallel Algorithms?

  14. Data Dependencies • Map-Reduce does not efficiently express dependent data • User must code substantial data transformations • Costly data replication Independent Data Rows

  15. Iterative Algorithms • Map-Reduce not efficiently express iterative algorithms: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Slow Processor Data Data Data Data Data Barrier Barrier Barrier

  16. MapAbuse: Iterative MapReduce • Only a subset of data needs computation: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

  17. MapAbuse: Iterative MapReduce • System is not optimized for iteration: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Startup Penalty Disk Penalty Disk Penalty Startup Penalty Startup Penalty Disk Penalty Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data

  18. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph)? Map Reduce? SVM Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  19. Pregel (Giraph) • Bulk Synchronous Parallel Model: Compute Communicate Barrier

  20. Problem Bulk synchronous computation can be highly inefficient. Example:Loopy Belief Propagation

  21. Problem with Bulk Synchronous • Example Algorithm: If Red neighbor then turn Red • Bulk Synchronous Computation : • Evaluate condition on all vertices for every phase 4 Phases each with 9 computations  36 Computations • Asynchronous Computation (Wave-front) : • Evaluate condition only when neighbor changes 4 Phases each with 2 computations  8 Computations Time 0 Time 2 Time 3 Time 4 Time 1

  22. Data-Parallel Algorithms can be Inefficient Optimized in Memory Bulk Synchronous Asynchronous Splash BP The limitations of the Map-Reduce abstraction can lead to inefficient parallel algorithms.

  23. The Need for a New Abstraction • Map-Reduce is not well suited for Graph-Parallelism Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph) Feature Extraction Cross Validation Belief Propagation Kernel Methods SVM Computing Sufficient Statistics Tensor Factorization PageRank Lasso Neural Networks Deep Belief Networks

  24. What is GraphLab?

  25. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  26. Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. • Graph: • Social Network • Vertex Data: • User profile text • Current interests estimates • Edge Data: • Similarity weights

  27. Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

  28. The Scheduler The scheduler determines the order that vertices are updated. b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 The process repeats until the scheduler is empty.

  29. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  30. Ensuring Race-Free Code • How much can computation overlap?

  31. GraphLab Ensures Sequential Consistency For each parallel execution, there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single CPU Sequential

  32. Consistency Rules Full Consistency Data Guaranteed sequential consistency for all update functions

  33. Full Consistency Full Consistency

  34. Obtaining More Parallelism Full Consistency Edge Consistency

  35. Edge Consistency Edge Consistency Safe Read CPU 1 CPU 2

  36. Consistency Through R/W Locks • Read/Write locks: • Full Consistency • Edge Consistency Write Write Write Canonical Lock Ordering Read Read Write Read Write

  37. GraphLab for Natural Graphs the Achilles heel

  38. Problem: High Degree Vertices • Graphs with high degree vertices are common: • Power-Law Graphs (Social Networks): • Affects algorithms like label-propagation • Probabilistic Graphical Models: • Hyper-parameters which couple large sets of data • Connectivity structure induced by natural phenomena • High degree vertices kill parallelism: Pull a Large Amount of State Requires Heavy Locking Processed Sequentially

  39. Proposed Four Solutions • Associative Commutative Update Functors • Transition from stateless to statefulupdate functions • Decomposable Update Functors • Expose greater parallelism by further factoring update functions • Abelian Group Caching (concurrent revisions) • Allows for controllable races through diff operations • Stochastic Scopes • Reduce degree through sampling

  40. Associative Commutative Update Functors • Associative commutative functions which: i.e., 1 + (2 + 3) • Subsumes Message Passing, Traveler Model, and Pregel (Giraph) • Function composition equivalent to combiner in Pregel LabelPropUF(i, scope,Δ) // Get Neighborhood data (OldLikes[i], Likes[i], Wij) scope; // Update the vertex data Likes[i]  Likes[i] + Δ; // Reschedule Neighbors if needed if |Likes[i] – OldLikes[i]| > ε then for all neighbors j Δ  (Likes[i] – OldLikes[i]) * Wij; reschedule(j,Δ); OldLikes[i]  Likes[i] Update Functor composition: • Achieved by defining anoperator+ for update functors • Update functors composition may be computed in parallel

  41. Advantages of Update Functors • Eliminates the need to read neighbor values • Reduces locking overhead • Enables task de-duplication and hinting: • User defined (+) can choose to eliminate orcompose • Dynamically control execution parameters: • priority(), consistency_level(), is_decomposable() … • Needed to to enable further extensions • Decomposable update functors etc… • Simplifies schedulers: • Many update functions  single update functor

  42. Disadvantages of Update Functors • Typically constructed sequentially • Requires access to edge weights • Problems exist in Pregel/Giraph as well • Changed required substantial engineering effort  • Affected 2/3 of the GraphLab Library lpF(i, scope,Δ) // Update the vertex data Likes[i]  Likes[i] + Δ; // Reschedule Neighbors if needed if |Likes[i] – OldLikes[i]| > ε then for all neighbors j Δ  (Likes[i] – OldLikes[i]) * Wij; reschedule(j,Δ); OldLikes[i]  Likes[i]

  43. Decomposable Update Functors Breaking computation over edges of the graph.

  44. Decomposable Update Functors • Decompose update functions into 3 phases: • Locks are acquired only for region within a scope  Relaxed Consistency Gather Apply Scatter Scope Y Y Y Y Apply the accumulated value to center vertex Parallel Sum + + … +  Δ Update adjacent edgesand vertices. Y Y Y Y User Defined: User Defined: User Defined: Apply( , Δ)  Gather( )  Δ Scatter( )  Y Y Δ1 + Δ2 Δ3

  45. Decomposable Update Functors • Implementing Label Propagation with Factorized Update Functions: • Implemented using update functors state as accumulator: • Uses same (+) operator to merge partial gathers: Scatter(i, scope){ // Get Neighborhood data (Likes[i], Wij,Likes[j]) scope; // Reschedule if changed if Likes[i] changed then reschedule(j); } Gather(i,j, scope) // Get Neighborhood data (Likes[i], Wij,Likes[j]) // Emit accumulator emit Wij x Likes[j] as Δ; Apply(i, scope, Δ){ // Get Neighborhood (Likes[i]) scope; // Update the vertex Likes[i]  Δ; } Δ1 + Δ2 Δ3 ( + )( ) F1 F2 F1 F2 Y Y

  46. Decomposable Functors • Fits many algorithms • Loopy Belief Propagation, Label Propagation, PageRank… • Addresses the earlier concerns • Problem:High degree vertices will still get locked frequently and will need to be up-to-date Large State Heavy Locking Sequential Parallel Gather and Scatter Fine Grained Locking Distributed Gather and Scatter

  47. Abelian Group Caching Enabling eventually consistent data races

  48. Abelian Group Caching • Issue: • All earlier methods maintain a sequentially consistent view of data across all processors. • Proposal: Try to split data instead of computation. • How can we split the graph without changing the update function? CPU 4 CPU 3 CPU 1 CPU 2 CPU 2 CPU 3 CPU 4 CPU 1

  49. Insight from WSDM paper • Answer: Allow Eventually Consistent data races • High degree vertices admit slightly “stale” values: • Changes in a few elements  negligible effect • High degreevertex updates typically a form of “sum” operation which has an “inverse” • Example: Counts, Averages, Sufficient statistics • Counter Example: Max • Goal:Lazily synchronize duplicate data • Similar to a version control system • Intermediate values partially consistent • Final value at termination must be consistent

  50. Example • Every processor initial has a copy of the same central value: Master 10 10 10 10 Current Current Current Processor 1 Processor 2 Processor 3

More Related