150 likes | 292 Vues
Unstructured grids, commonly used in finite element methods for various applications, feature variable-sized elements such as triangles and tetrahedra. This presentation explores the challenges of computing and partitioning these grids across multiple processors. It discusses the importance of graph structures, attributes linked to nodes and elements, and recent advancements in partitioning techniques. Key topics include load balancing, communication strategies, and existing algorithms like METIS for optimizing performance in parallel computing environments.
E N D
Application Paradigms:Unstructured GridsCS433Spring 2001 Laxmikant Kale
Unstructured Grids • Typically arise in finite element method: • E.g. Space is tiled with variable-size-and-shape triangles • in 3D: may be tetrahedra, or hexahedra • Allows one to adjust the resolution in different regions • The base data structure is a graph • Often, represented as bipartite graph: • E.g. Triangles (Elements) and Nodes
Unstructured grid computations • Typically • Attributes (stresses, strains, pressure, temperature, velocities) are attached to nodes and elements • Programs loop over elements and loop over nodes, separately • Each time you “visit” an element: • Need to access, and possibly modify, all nodes connected to it. • Each time you visit a node: • Typically, access and modify only node attributes • Rarely: access/modify attributes of elements connected to it
Unstructured grids: parallelization issues • Two concerns: • The unstructured grid graph must be partitioned across processors • vproc (virtual processor, in general) • Boundary values must be shared • What to partition and what to duplicate (at the boundaries) • Partition elements (so each element belongs to exactly one vproc) • Share nodes at the boundary • Each node potentially has several ghost copies • Why is this better than partitioning nodes, and sharing elements?
Partitioning unstructured grids • Not so simple as structured grids • “by rows”, “by columns”, “rectangular”, .. Don’t work • Geometric? • Applicable only if each node has coordinates • Even when applicable, may not lead to good performance • What performance metrics to use? • Load balance: the number of elements in each partition • Communication • Number of shared nodes (Total) • Maximum number of shared nodes for any one partition • Maximum number of “neighbor partitions” for any partition • Why? per message cost • Geometric: difficult to optimize both
MP issues: • Charm++ help: • Today (Wed, 2/21) 2pm to 5:30 pm, • 2504, 2506, 2508 DCL (Parallel Programming Laboratory) • My office hours for this week: • Thursday 10:00 A.M. to 12:00 noon on Thursday
Grid partitioning • When communication costs are relatively low • Either because the data-set is large or the computation per element is large • Geometric methods can be used: • Orthogonal Recursive Bisection (ORB) • Basic idea: Recursively divide sets into two • Keep shapes squarish as long as possible • For each set: • Find bounding box (Xmax, Xmin, Ymax, Ymin, ..) • Find the longer dimension (X or Y or ..) • Find a cut along the longer dimension that will divide the set equally • Doesn’t have to be at the midpoint of the section • Partition the element in the two sets based on the cut • Repeat for each set • Variation: non-power-of-two processors
Grid partitioning: quad/oct trees • Another Geometric technique: • At each step, divide the set into 2xD subsets, where D is the number of physical dimensions\ • In 2-D: 4 quadrants • Dividing line goes thru geometric midpoint of the box. • Bounding box is NOT recalculated each time in the recursion • Comparison with ORB
Grid partitioning: Graph partitioners • CHACO and METIS are well-known programs • Optimize both load imbalance and communication overhead • But often ignore per-message cost, or the maximum-per-partition costs • Earlier algorithm: KR (Kernigham-Ritchie) • METIS first coarsens the graph, applies KR to it, and then refines the graph • Doing this not just once, but a k-level coarsening-refining
Crack Propagation • Explicit FEM code • Zero-volume Cohesive Elements inserted near the crack • As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance • Framework handles • Partitioning elements into chunks • Communication between chunks • Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle
Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle
Unstructured grid: managing communication • Suppose triangles A B and C are on different processors • Node 1 is shared between all 3 processors • Must have a copy on all 3 processors • When values need to be added up: • Option 1 (star): let A (say) be the “owner” of 1, • B and C send their copy of “1” to A, • A combines them (usually, just adding them up) • A sends updated values to B and C • Option 2: (symmetric): each sends its copy of 1 to both the others • Which one is better? 1 C B A
Unstructured grid: managing communication • In either scheme: • Each vproc maintains a list of neighboring vprocs • For each neighbor: • maintains a list of shared nodes • Each node has a local index (my 5th node). • The same list works in both directions • Send • Receive
Adaptive variations: Structured grids: • Suppose you need a different level of refinement at different places in the grid: • Adaptive Mesh Refinement • Quad and Oct trees can be used • Neighboring regions may have resolutions that differ by 1 level • Requiring (possibly complex) interpolation algorithms • The fact that you have to do the refinement in the middle of a parallel computation makes a difference • Again and again, but often not every step • Adjust your communication list • Alternatively, put a layer of software in the middle to do the interpolations • so each square chunk thinks it has exactly one nbr on each side
Adaptive variations: unstructured grids • Mesh may be refined in places, dynamically: • This is much harder to do (even sequentially) than for structured grids • Think about triangles: • Quality restriction: avoid skinny long triangles • From parallel computing point of view: • Need to change the list of shared nodes • Load balance may shift • Load balancing: • Abandon partitioning and repartition • Incrementally adjust • (typically with virtualization)