1 / 33

Domain decomposition in parallel computing

Domain decomposition in parallel computing. COT 5410 – Spring 2004. Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University. Outline. Background Geometric partitioning Graph partitioning Static Dynamic Important points. Background.

paul2
Télécharger la présentation

Domain decomposition in parallel computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain decomposition in parallel computing COT 5410 – Spring 2004 Ashok Srinivasan www.cs.fsu.edu/~asriniva Florida State University

  2. Outline • Background • Geometric partitioning • Graph partitioning • Static • Dynamic • Important points

  3. Background • Tasks in a parallel computation need access to certain data • Same datum may be needed by multiple tasks • Example: In matrix-vector multiplication, b2 is needed for the computation of all ci2, 1 < i < n • If a process does not “own” a datum needed by its task, then it has to get it from a process that has it • This communication is expensive • Aims of domain decomposition • * Distribute the data in such a manner that the communication required is minimized • * Ensure that the computational loads on processes are balanced

  4. Domain decomposition example • Finite difference computation • New value of a node depends on old values of its neighbors • We want to divide the nodes amongst the processes so that • Communication is minimized • Measure of partition quality • Computational load is evenly balanced

  5. Geometric partitioning • Partition a set of points • Uses only coordinate information • Balances the load • The heuristic tries to ensure that communication costs are low • Algorithms are typically fast, but partition not of high quality • Examples • Orthogonal recursive bisection • Inertial • Space filling curves

  6. Orthogonal recursive bisection • Recursively bisect orthogonal to the longest dimension • Assume communication is proportional to the surface area of the domain, and aligned with coordinate axes • Recursive bisection • Divide into two pieces, keeping load balanced • Apply recursively, until desired number of partitions obtained

  7. Inertial • ORB may not be effective if cuts along the x, y, or z directions are not good ones • Inertial • Recursively bisect orthogonal to the inertial axis

  8. Space filling curves • * Space filling curves • A continuous curve that fills the space • Order the points based on their relative position on the curve • Choose a curve that preserves proximity • Points that are close in space should be close in the ordering too • Example • Hilbert curve

  9. Hi H1 H2 Hi+1 Hilbert curve = lim Hn n Hilbert curve • Sources • http://www.dcs.napier.ac.uk/~andrew/hilbert.html • http://www.fractalus.com/kerry/tutorials/hilbert/hilbert-tutorial.html

  10. Domain decomposition with a space filling curve • Order points based on their position on the curve • Divide into P parts • P is the number of processes • Space filling curves can be used in adaptive computations too • They can be extended to higher dimensions too

  11. Graph partitioning • * Model as graph partitioning • Graph G = (V, E) • Each task is represented by a vertex • A weight can be used to represent the computational effort • An edge exists between tasks if one needs data owned by the other • Weights can be associated with edges too • Goal • Partition vertices into P parts such that each partition has equal vertex weights • Minimize the weights of edges cut • Problem is NP hard • Edge cut metric • Judge the quality of the partitioning by the number of edges cut

  12. Static graph partitioning • Combinatorial • Levelized nested dissection • Kernighan-Lin/Feduccia-Matheyses • Spectral partitioning • Multi-level methods

  13. Combinatorial partitioning • Use only connectivity information • Examples • Levelized nested dissection • Kernighan-Lin/Feduccia-Matheyses

  14. Levelized nested dissection (LND) • Idea is similar to the geometric methods • But cannot use coordinate information • Instead of projecting vertices along the longest axis, order them based on distance from a vertex that may be one extreme of the longest dimension of a graph • Pseudo-peripheral vertex • Perform a breadth-first search, starting from an arbitrary vertex • The vertex that is encountered last might be a good approximation to a peripheral vertex

  15. LND example Finding a pseudoperipheral vertex 3 2 3 2 1 3 1 2 Initial vertex 1 3 4 Pseudoperipheral vertex

  16. LND example – Partitioning 5 6 3 4 5 2 5 4 2 3 1 Partition Initial vertex Recursively bisect the subgraphs

  17. Kernighan-Lin/Fiduccia-Matheyses • Refines an existing partition • Kernighan-Lin • Consider pairs of vertices from different partitions • Choose a pair whose swapping will result in the best improvement in partition quality • The best improvement may actually be a worsening • Perform several passes • Choose best partition among those encountered • Fiduccia-Matheyses • Similar but more efficient • Boundary Kernighan-Lin • Consider only boundary vertices to swap • ... and many other variants

  18. Kernighan-Lin example Swap these Better partition Edge cut = 3 Existing partition Edge cut = 4

  19. Spectral method • Based on the observation that a Fiedler vector of a graph contains connectivity information • Laplacian of a graph: L • lii = di (degree of vertex i) • lij= -1 if edge {i,j} exists, otherwise 0 • Smallest eigenvalue of L is 0 with eigenvector all 1 • All other eigenvalues are positive for a connected graph • Fiedler vector • Eigenvector corresponding to the second smallest eigenvalue

  20. Fiedler vector • Consider a partitioning of V into A and B • Let yi = 1 if vie A, and yi = -1 if vie B • For load balance, Si yi = 0 • Also Seije E (yi-yj)2 = 4 x number of edges across partitions • Also, yTLy = Si di yi2 – 2 Seije E yiyj = Seije E (yi-yj)2

  21. Optimization problem • * The optimal partition is obtain by solving • Minimize yTLy • Constraints: • yie {-1,1} • Si yi = 0 • This is NP hard • Relaxed problem • Minimize yTLy • Constraints: • Si yi = 0 • Add a constraint on a norm of y, example, ||y||2 = n0.5 • Note • (1, 1, ..., 1)T is an eigenvector with eigenvalue 0 • For a connected graph, all other eigenvalues are positive and orthogonal to this eigenvector, which implies Si yi = 0 • The objective function is minimized by a Fiedler vector

  22. Spectral algorithm • Find a Fiedler vector of the Laplacian of the graph • Note that the Fiedler value (the second smallest eigenvalue) yields a lower bound on the communication cost, when the load is balanced • From the Fiedler vector, bisect the graph • Let all vertices with components in the Fiedler vector greater than the median be in one component, and the rest in the other • Recursively apply this to each partition • Note: Finding the Fiedler vector of a large graph can be time consuming

  23. Multilevel methods • Idea • It takes time to partition a large graph • So partition a small graph instead! • * Three phases • Graph coarsening • Combine vertices to create a smaller graph • Example: Find a suitable matching • Apply this recursively until a suitably small graph is obtained • Partitioning • Use spectral or another partitioning algorithm to partition the small graph • Multilevel refinement • Uncoarsen the graph to get a partitioning of the original graph • At each level, perform some graph refinement

  24. Multilevel example(without refinement) 9 10 5 7 3 11 2 4 8 12 16 1 1 6 15 13 14

  25. Multilevel example(without refinement) 9 10 5 7 3 11 2 4 8 12 16 1 1 6 15 13 14

  26. Multilevel example(without refinement) 9 10 5 7 3 1 1 2 11 1 2 1 2 2 4 8 1 12 16 1 1 1 6 15 1 13 14

  27. Multilevel example(without refinement) 9 10 5 7 3 1 1 2 11 1 2 1 2 2 4 8 1 12 16 1 1 6 15 1 13 14

  28. Multilevel example(without refinement) 9 10 5 7 3 1 1 2 11 1 2 1 2 2 4 8 1 12 16 1 1 6 15 1 13 14 1 2 2 1

  29. Dynamic partitioning • We have an initial partitioning • Now, the graph changes • * Determine a good partition, fast • * Also minimize the number of vertices that need to be moved • Examples • PLUM • Jostle • Diffusion

  30. PLUM • Partition based on the initial mesh • Vertex and edge weights alone changed • Map partitions to processors • Use more partitions than processors • Ensures finer granularity • Compute a similarity matrix based on data already on a process • Measures savings on data redistribution cost for each (process, partition) pair • Choose assignment of partitions to processors • Example: Maximum weight matching • Duplicate each processor: # of partitions/P times • Alternative: Greedy approximation algorithm • Assign in order of maximum similarity value • http://citeseer.nj.nec.com/oliker98plum.html

  31. JOSTLE • Use Hu and Blake’s scheme for load balancing • Solve Lx = b using Conjugate Gradient • L = Laplacian of processor graph, bi = Weight on process Pi – Average weight • Move max(xi-xj, 0) weight between Pi and Pj • Leads to balanced load • Equivalent to Pi sending xi load to each neighbor j, and each neighbor Pj sending xj to Pi • Net loss in load for Pi = dixi - Sneighborjxj = L(i)x = bi • where L(i) is row i of L, and di is degree of i • New load for Pi = weight on Pi - bi = average weight • Leads to minimum L2 norm of load moved • Using max(xi-xj, 0) • Select vertices to move, based on relative gain • http://citeseer.nj.nec.com/walshaw97parallel.html

  32. Diffusion • Involves only communication with neighbors • A simple scheme • Processor Pi repeatedly sends a wi weight to each neighbor • wi = weight on Pi • wk = (I – a L) wk-1 , wk = weight vector at iteration k • Simple criteria exist for choosing a to ensure convergence • Example: a = 0.5/(maxi di), • More sophisticated schemes exist

  33. Important points • Goals of domain decomposition • Balance the load • Minimize communication • Space filling curves • Graph partitioning model • Spectral method • Relax NP hard integer optimization to floating point, and then discretize to get approximate integer solution • Multilevel methods • Three phases • Dynamic partitioning – additional requirements • Use old solution to find new one fast • Minimize number of vertices moved

More Related