stencil pattern n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Stencil Pattern PowerPoint Presentation
Download Presentation
Stencil Pattern

Loading in 2 Seconds...

play fullscreen
1 / 36

Stencil Pattern

1 Vues Download Presentation
Télécharger la présentation

Stencil Pattern

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Stencil Pattern 6b.1 ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 24, 2013 slides6b.ppt

  2. Stencil Pattern • A stencil describes a 2- or 3- dimensional layout of processes, with each process able to communicate only with its neighbors. • Appears in simulating many real-life situations. • Examples • Solving partial differential equations using discretized methods, which may be for: • Modeling engineering structures • Weather forecasting, see intro to course slides1a-1 • Particle dynamics simulations • Modeling chemical and biological structures 6b.2

  3. Stencil pattern On each iteration, each node communicates with neighbors to get stored computed values Two-way connection Compute node Source/sink 6b.3

  4. (Iterative synchronous) stencil pattern Often globally synchronous and iterative: Processes compute and communicate only with their neighbors, exchanging results Check termination condition Repeat Stop 6a.4

  5. Seeds Stencil Pattern The (Iterative synchronous) stencil pattern is implemented in Seeds, simply called the stencil pattern Seeds module interface: public abstract class Stencil extends BasicLayerInterface{ public abstract boolean OneIterationCompute( StencilData data); public abstract StencilData DiffuseData(int segment); public abstract void GatherData(int segment, StencilData dat); public abstract int getCellCount(); } For more details see: “Seeds Framework Stencil Template Tutorial” Jeremy Villalobos http://coit-grid01.uncc.edu/seeds/tutorials.php 6b.5

  6. Application example of stencil pattern Solving Laplace’s Equation Solve for f over the two-dimensional x-y space. For computer solution, finite difference methods appropriate Two-dimensional solution space “discretized” into large number of solution points.

  7. Finite Difference Method

  8. Question: Do you recognize this?

  9. Heat Distribution Problem (Steady State Heat Equation) Finding the static distribution of heat in a space, here a 2-dimensional space but could be 3-dimensional. An area has known temperatures along each of its borders (boundary conditions). Find the temperature distribution within. Each point taken to be the average of the four neighboring points 6b.9

  10. Natural ordering For convenience, edges represented by points, but having fixed values, and used in computing internal values. 6.10

  11. Relationship with a General System of Linear Equations Using natural ordering, ith point computed from ith equation: which is a linear equation with five unknowns (except those with boundary points). In general form, the ith equation becomes:

  12. Question will a Jacobi iterative method converge?

  13. Sequential Code Using a fixed number of iterations for (iteration = 0; iteration < limit; iteration++) { for (i = 1; i < n; i++) for (j = 1; j < n; j++) g[i][j] = 0.25*(h[i-1][j]+h[i+1][j]+h[i][j-1]+h[i][j+1]); for (i = 1; i < n; i++) /* update points */ for (j = 1; j < n; j++) h[i][j] = g[i][j]; } using original numbering system (n x n array). Earlier we saw this can be improved by using a single 3-dimensional array. 6b.13

  14. Partitioning Normally allocate more than one point to each processor, because many more points than processors. Points could be partitioned into square blocks or strips: Communication on 4 edges Communication on 2 edges In general, strip partition best for large communication startup time, and block partition best for small startup time – see textbook for analysis 6b.14

  15. Ghost Points Additional row of points at each edge that hold values from adjacent edge. Each array of points increased to accommodate ghost rows. 6b.15

  16. Application Example A room has four walls and a fireplace. Temperature of wall is 20°C, and temperature of fireplace is 100°C. Write a parallel program using Jacobi iteration to compute the temperature inside the room and plot (preferably in color) temperature contours at 10°C intervals. 6b.16

  17. Sample student output 6b.17

  18. Other class of synchronous problems suitable for the Stencil pattern: Cellular Automata The problem space is divided into cells. Each cell can be in one of a finite number of states. Cells affected by their neighbors according to certain rules, and all cells are affected simultaneously in a “generation.” Rules re-applied in subsequent generations so that cells evolve, or change state, from generation to generation. Most famous cellular automata is the “Game of Life” devised by John Horton Conway, a Cambridge mathematician. 6b.18

  19. The Game of Life Board game - theoretically infinite two-dimensional array of cells. Each cell can hold one “organism” and has eight neighboring cells, including those diagonally adjacent. Initially, some cells occupied. The following rules apply: 1. Every organism with two or three neighboring organisms survives for the next generation. 2. Every organism with four or more neighbors dies from overpopulation. 3. Every organism with one neighbor or none dies from isolation. 4. Each empty cell adjacent to exactly three occupied neighbors will give birth to an organism. These rules were derived by Conway “after a long period of experimentation.” 6b.19

  20. Simple Fun Examples of Cellular Automata “Sharks and Fishes” An ocean could be modeled as a three-dimensional array of cells. Each cell can hold one fish or one shark (but not both). Fish and sharks follow “rules.” 6b.20

  21. Fish • Might move around according to these rules: • If there is one empty adjacent cell, the fish moves to this cell. • 2. If there is more than one empty adjacent cell, the fish moves to one cell chosen at random. • 3. If there are no empty adjacent cells, the fish stays where it is. • 4. If the fish moves and has reached its breeding age, it gives birth to a baby fish, which is left in the vacating cell. • 5. Fish die after x generations. 6b.21

  22. Sharks Might be governed by the following rules: 1. If one adjacent cell is occupied by a fish, the shark moves to this cell and eats the fish. 2. If more than one adjacent cell is occupied by a fish, the shark chooses one fish at random, moves to the cell occupied by the fish, and eats the fish. 3. If no fish are in adjacent cells, the shark chooses an unoccupied adjacent cell to move to in a similar manner as fish move. 4. If the shark moves and has reached its breeding age, it gives birth to a baby shark, which is left in the vacating cell. 5. If a shark has not eaten for y generations, it dies. 6b.22

  23. Sample Student Output 6b.23 6.68

  24. Similar examples: “foxes and rabbits” - Behavior of rabbits to move around happily whereas behavior of foxes is to eat any rabbits they come across. 6b.24

  25. Serious Applications for Cellular Automata Examples • fluid/gas dynamics • the movement of fluids and gases around objects • diffusion of gases • biological growth • airflow across an airplane wing • erosion/movement of sand at a beach or riverbank. 6b.25

  26. Algorithmic ways to improving performance of computational stencil applications

  27. Partially Synchronous Computations -- Computations in which individual processes operate without needing to synchronize with other processes on every iteration. Important idea because synchronizing processes very significantly slows the computation and a major cause for reduced performance of parallel programs. 6b.27

  28. Heat Distribution Problem Re-visited Making Partially Synchronous Uses previous iteration results h[][] for next iteration, g[][] forall (i = 1; i < n; i++) forall (j = 1; j < n; j++) { g[i][j]=0.25*(h[i-1][j]+h[i+1][j]+h[i][j-1]+h[i][j+1]); } Synchronization point at end of each iteration The waiting can be reduced by not forcing synchronization at each iteration by allowing processes to move to next iteration before all data points computed – then uses data from not only last iteration but possibly from earlier iterations. Method then becomes an “asynchronous iterative method.” 6b.28

  29. Asynchronous Iterative Method Convergence Conditions Mathematical conditions for convergence may be more strict. Each process may not be allowed to use any previous iteration values if the method is to converge. Chaotic Relaxation A form of asynchronous iterative method introduced by Chazan and Miranker (1969) in which conditions stated as: “there must be a fixed positive integer s such that, in carrying out the evaluation of the ith iterate, a process cannot make use of any value of the components of the jth iterate if j < i - s” (Baudet, 1978). 6b.29

  30. Gauss-Seidel Relaxation Uses some newly computed values to compute other values in that iteration.

  31. Gauss-Seidel Iteration Formula where superscript indicates iteration. With natural ordering of unknowns, formula reduces to At kth iteration, two of the four values (before ith element) taken from kth iteration and two values (after ith element) taken from (k-1)th iteration. Have:

  32. Red-Black Ordering First, black points computed. Next, red points computed. Black points computed simultaneously, and red points computed simultaneously.

  33. Red-Black Parallel Code forall (i = 1; i < n; i++) forall (j = 1; j < n; j++) if ((i + j) % 2 == 0) /* compute red points */ f[i][j] = 0.25*(f[i-1][j] + f[i][j-1] + f[i+1][j] + f[i][j+1]); forall (i = 1; i < n; i++) forall (j = 1; j < n; j++) if ((i + j) % 2 != 0) /* compute black points */ f[i][j] = 0.25*(f[i-1][j] + f[i][j-1] + f[i+1][j] + f[i][j+1]);

  34. Multigrid Method First, a coarse grid of points used. With these points, iteration process will start to converge quickly. At some stage, number of points increased to include points of coarse grid and extra points between points of coarse grid. Initial values of extra points found by interpolation. Computation continues with this finer grid. Grid can be made finer and finer as computation proceeds, or computation can alternate between fine and coarse grids. Coarser grids take into account distant effects more quickly and provide a good starting point for the next finer grid.

  35. Multigrid processor allocation

  36. Questions