1 / 23

Parallel and Distributed Processing CSE 8380

Parallel and Distributed Processing CSE 8380. February 3, 2005 Session 7. Contents. Abstract Models PRAM Model Complexity Analysis Introduction to Parallel Algorithms Sorting. What is a Model?.

adie
Télécharger la présentation

Parallel and Distributed Processing CSE 8380

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel and Distributed ProcessingCSE 8380 February 3, 2005 Session 7

  2. Contents • Abstract Models • PRAM Model • Complexity Analysis • Introduction to Parallel Algorithms • Sorting

  3. What is a Model? • According to Webster’s Dictionary, a model is “a description or analogy used to help visualize something that cannot be directly observed.” • According to The Oxford English Dictionary, a model is “a simplified or idealized description or conception of a particular system, situation or process.”

  4. Why Models? • In general, the purpose of Modeling is to capture the salient characteristics of phenomena with clarity and the right degree of accuracy to facilitate analysis and prediction. Megg, Matheson and Tarjan (1995)

  5. Models in Problem Solving • Computer Scientists use models to help design problem solving tools such as: • Fast Algorithms • Effective Programming Environments • Powerful Execution Engines

  6. An Interface Applications Provides operations A model is an interface separating high level properties from low level ones MODEL Requires implementation Architectures

  7. Models in this class • Shared Memory Model • Distributed Memory Model

  8. PRAM Model Control Private Memory P1 • Synchronized Read Compute Write Cycle • EREW • ERCW • CREW • CRCW • Complexity: T(n), P(n), C(n) Global Private Memory P2 Memory Private Memory Pp

  9. The PRAM model and its variations (cont.) • There are different modes for read and write operations in a PRAM. • Exclusive read(ER) • Exclusive write(EW) • Concurrent read(CR) • Concurrent write(CW) • Common • Arbitrary • Minimum • Priority • Based on the different modes described above, the PRAM can be further divided into the following four subclasses. • EREW-PRAM model • CREW-PRAM model • ERCW-PRAM model • CRCW-PRAM model

  10. Analysis of Algorithms • Sequential Algorithms • Time Complexity • Space Complexity • An algorithm whose time complexity is bounded by a polynomial is called a polynomial-time algorithm. An algorithm is considered to be efficient if it runs in polynomial time.

  11. Analysis of Sequential Algorithms NP-hard NP P NP-complete The relationships among P, NP, NP-complete, NP-hard

  12. Analysis of parallel algorithm Performance of a parallel algorithm is expressed in terms of how fast it is and how much resources it uses when it runs. • Run time, which is defined as the time during the execution of the algorithm • Number of processorsthe algorithm uses to solve a problem • The costof the parallel algorithm, which is the product of the run time and the number of processors

  13. Analysis of parallel algorithmThe NC-class and P-completeness NP-hard NP NC P P-complete NP-complete The relationships among P, NP, NP-complete, NP-hard, NC, and P-complete

  14. Simulating multiple accesses on an EREW PRAM • Broadcasting mechanism: • P1 reads x and makes it known to P2. • P1 and P2 make x known to P3 and P4, respectively, in parallel. • P1, P2, P3 and P4 make x known to P5, P6, P7 and P8, respectively, in parallel. • These eight processors will make x know to another eight processors, and so on.

  15. x x x x x x x x x P5 P6 P8 P7 x P1 x x x x x x x x x x x (a) Simulating multiple accesses on an EREW PRAM (cont.) x L L L L x P2 x P3 x P4 (c) (d) (b) Simulating Concurrent read on EREW PRAM with eight processors using Algorithm Broadcast_EREW

  16. Parallel Algorithms • Constructs • Processor Pi • Forall • Where • Do in Parallel • Others

  17. Simulating multiple accesses on an EREW PRAM (cont.) Algorithm Broadcast_EREW Processor P1 y (in P1’s private memory)  x L[1]  y for i=0 to log p-1 do forall Pj, where 2i +1 < j < 2i+1 do in parallel y (in Pj’s private memory)  L[j-2i] L[j]  y endfor endfor

  18. Enumeration Sort • Given a list on n numbers a1, a2, …, an • We try to find the position of each element ai in the sorted list by computing the number of elements smaller than it • It ci elements are smaller than ai, then it is the (ci+1)th element in the sorted list • If 2 or more elements have the same value, the element with the largest index in the unsorted list will be considered the largest in the sorted list.

  19. Sort-CRCW Assumptions • To sort n elements, we use n2 processors (n rows and n columns) • Pi,j processor in row i, column j • Concurrent write  sum of all values • A[1..n] array of elements in global memory • C[1..n]  array to store number of elements smaller than every element in A

  20. Sort-CRCW • Two steps • Each row of processors i computes C[i], the number of elements smaller than A[i]. Each processor Pi,j compares A[i] and A[j], then updates C[i] appropriately • The first row in each Pi,1 row places places A[i] in its proper position in the sorted list (C[i] + 1)

  21. Algorithm Details Detail of two step Algorithm /* step 1 */ forall Pi,j, where 1 < i, j<n do in parallel if A[i] > A[j] or (A[i] = A[j] and i > j) then C[i]  1 else C[i]  0 endif endfor /* step 2 */ forall Pi,l, where 1 < i<n do in parallel A[C[i] +1]  A [i] endfor

  22. Complexity • Run time: T(n) = O(1) • Number of processors: P(n) = n2 • Cost: c(n) = O(n2) • Is it cost optimal? • No! (sequential sort can be done in O(n log n)

  23. 2 9 4 6 4 0 1 9 6 SUM Concurrent write Example: sort (9, 4, 6) P1,1 P1,2 P1,3 A = 9 & 9 9 & 4 9 & 6 C = P2,1 P2,2 P2,3 A = 4 & 9 4 & 4 4 & 6 P3,1 P3,2 P3,3 T(n) = O(1) P(n) = n2 C(n) = T(n) * P(n) = O(n2) 6 & 9 6 & 4 6 & 6

More Related