Techniques for Efficient Computation and Partitioning of Unstructured Grids

Application Paradigms:Unstructured GridsCS433Spring 2001 Laxmikant Kale

Unstructured Grids • Typically arise in finite element method: • E.g. Space is tiled with variable-size-and-shape triangles • in 3D: may be tetrahedra, or hexahedra • Allows one to adjust the resolution in different regions • The base data structure is a graph • Often, represented as bipartite graph: • E.g. Triangles (Elements) and Nodes

Unstructured grid computations • Typically • Attributes (stresses, strains, pressure, temperature, velocities) are attached to nodes and elements • Programs loop over elements and loop over nodes, separately • Each time you “visit” an element: • Need to access, and possibly modify, all nodes connected to it. • Each time you visit a node: • Typically, access and modify only node attributes • Rarely: access/modify attributes of elements connected to it

Unstructured grids: parallelization issues • Two concerns: • The unstructured grid graph must be partitioned across processors • vproc (virtual processor, in general) • Boundary values must be shared • What to partition and what to duplicate (at the boundaries) • Partition elements (so each element belongs to exactly one vproc) • Share nodes at the boundary • Each node potentially has several ghost copies • Why is this better than partitioning nodes, and sharing elements?

Partitioning unstructured grids • Not so simple as structured grids • “by rows”, “by columns”, “rectangular”, .. Don’t work • Geometric? • Applicable only if each node has coordinates • Even when applicable, may not lead to good performance • What performance metrics to use? • Load balance: the number of elements in each partition • Communication • Number of shared nodes (Total) • Maximum number of shared nodes for any one partition • Maximum number of “neighbor partitions” for any partition • Why? per message cost • Geometric: difficult to optimize both

MP issues: • Charm++ help: • Today (Wed, 2/21) 2pm to 5:30 pm, • 2504, 2506, 2508 DCL (Parallel Programming Laboratory) • My office hours for this week: • Thursday 10:00 A.M. to 12:00 noon on Thursday

Grid partitioning • When communication costs are relatively low • Either because the data-set is large or the computation per element is large • Geometric methods can be used: • Orthogonal Recursive Bisection (ORB) • Basic idea: Recursively divide sets into two • Keep shapes squarish as long as possible • For each set: • Find bounding box (Xmax, Xmin, Ymax, Ymin, ..) • Find the longer dimension (X or Y or ..) • Find a cut along the longer dimension that will divide the set equally • Doesn’t have to be at the midpoint of the section • Partition the element in the two sets based on the cut • Repeat for each set • Variation: non-power-of-two processors

Grid partitioning: quad/oct trees • Another Geometric technique: • At each step, divide the set into 2xD subsets, where D is the number of physical dimensions\ • In 2-D: 4 quadrants • Dividing line goes thru geometric midpoint of the box. • Bounding box is NOT recalculated each time in the recursion • Comparison with ORB

Grid partitioning: Graph partitioners • CHACO and METIS are well-known programs • Optimize both load imbalance and communication overhead • But often ignore per-message cost, or the maximum-per-partition costs • Earlier algorithm: KR (Kernigham-Ritchie) • METIS first coarsens the graph, applies KR to it, and then refines the graph • Doing this not just once, but a k-level coarsening-refining

Crack Propagation • Explicit FEM code • Zero-volume Cohesive Elements inserted near the crack • As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance • Framework handles • Partitioning elements into chunks • Communication between chunks • Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle

Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle

Unstructured grid: managing communication • Suppose triangles A B and C are on different processors • Node 1 is shared between all 3 processors • Must have a copy on all 3 processors • When values need to be added up: • Option 1 (star): let A (say) be the “owner” of 1, • B and C send their copy of “1” to A, • A combines them (usually, just adding them up) • A sends updated values to B and C • Option 2: (symmetric): each sends its copy of 1 to both the others • Which one is better? 1 C B A

Unstructured grid: managing communication • In either scheme: • Each vproc maintains a list of neighboring vprocs • For each neighbor: • maintains a list of shared nodes • Each node has a local index (my 5th node). • The same list works in both directions • Send • Receive

Adaptive variations: Structured grids: • Suppose you need a different level of refinement at different places in the grid: • Adaptive Mesh Refinement • Quad and Oct trees can be used • Neighboring regions may have resolutions that differ by 1 level • Requiring (possibly complex) interpolation algorithms • The fact that you have to do the refinement in the middle of a parallel computation makes a difference • Again and again, but often not every step • Adjust your communication list • Alternatively, put a layer of software in the middle to do the interpolations • so each square chunk thinks it has exactly one nbr on each side

Adaptive variations: unstructured grids • Mesh may be refined in places, dynamically: • This is much harder to do (even sequentially) than for structured grids • Think about triangles: • Quality restriction: avoid skinny long triangles • From parallel computing point of view: • Need to change the list of shared nodes • Load balance may shift • Load balancing: • Abandon partitioning and repartition • Incrementally adjust • (typically with virtualization)

Techniques for Efficient Computation and Partitioning of Unstructured Grids

Techniques for Efficient Computation and Partitioning of Unstructured Grids

Presentation Transcript

Paradigmatic and syntagmatic analysis

Human Gene Therapy Application Procedures

Hospital Medicine An Evolution in Changing Paradigms

CHAPTER 2 PARADIGMS, THEORY, AND RESEARCH

Browser Security Model

Programming Paradigms - JAVA

Web Application Security

Super-Resolution Reconstruction of Images - Static and Dynamic Paradigms

Principles and Foundations of Ontologies and Semantic Grids

Results from the 3 rd Drag Prediction Workshop using the NSU3D Unstructured Mesh Solver

CIS 5930-04 – Spring 2001

Sociological Paradigms on Ethnicity

CS 602 — eScience and Grids

CIS 5930-04 – Spring 2001 Part 4: GUIs and Events

Structured P2P overlay networks

HENP Networks and Grids for Global VOs: from Vision to Reality

4-6. Leadership Paradigms

CIS 5930-04 – Spring 2001

Web Service Grids

Our festival