1 / 20

Parallel Algorithms

Parallel Algorithms. Lecture Notes. Motivation. Programs face two perennial problems:: Time : Run faster in solving a problem Example: speed up time needed to sort 10 million records Size : Solve a “bigger” problem

Télécharger la présentation

Parallel Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Algorithms Lecture Notes

  2. Motivation • Programs face two perennial problems:: • Time: Run faster in solving a problem • Example: speed up time needed to sort 10 million records • Size: Solve a “bigger” problem • Example: multiply matrixes of big dimensions : PC with 512MB RAM, can store a max size of 8192*8192 elems of a double type of 8 bytes) • Possible solution: parallelism • Split a problem into several tasks and perform these in parallel • A parallel computer: a broad definition: a set of processors that are able to work cooperatively to solve a computational problem • Includes: parallel supercomputers, clusters of workstations, multiple-processor workstations

  3. Concepts … parallel concurrent multiprocessing multiprogramming distributed

  4. Logical vs physical parallelism Program executed on a system with 3 processors Physical parallelism Multi-processing Program executed on a system with 1 processor Logical parallelism Multi-programming A concurrent program, 3 processes 0 0 P0 P0 P2 P1 Proces P0 P1 P0 Proces P1 P2 P2 P0 P1 Proces P2 P2 T T

  5. concurrent-distributed-parallel

  6. parallel distributed

  7. Parallelizing sequential code • The enabling condition for doing 2 tasks in parallel: no dependences between them ! • Parallelizing compilers: compile sequential programs into parallel code • Research goal since the 1970’s

  8. Example: Adding n numbers Sequential solution: sum = 0; for (i=0; i<n; i++) { sum += A[i]; } O(n) The sequential algorithm cannot be straightforward parallelized, since every instruction depends on the previous one

  9. Summing in sequence Always O(n) Summing in pairs P=1: O(n) P=n/2: O(log n) Parallelizing = re-thinking algorithm !

  10. It’s not likely a compiler will produce a good parallel code from a sequential specification any time soon… • Fact: For most computations, a “best” sequential solution (practically, not theoretically) and a “best” parallel solution are usually fundamentally different … • Different solution paradigms imply computations are not “simply” related • Compiler transformations generally preserve the solution paradigm • Therefore the programmer must discover the parallel solution !!!

  11. Sequential vs parallel programming • Has different costs, different advantages • Requires different, unfamiliar algorithms • Must use different abstractions • More complex to understand a program’s behavior • More difficult to control the interactions of the program’s components • Knowledge/tools/understanding more primitive

  12. Example: Count number of 3’s Sequential solution: count = 0; for (i=0; i<length; i++) { if (array[i]==3) count ++; } O(n)

  13. Example: Trial solution 1 • Divide array into t=4 chunks • Assign each chunk to a different concurrent task identified by id=0...t-1 • Code of each task: int length_per_thread = length/t; int start = id * length_per_thread; for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) count += 1; } Problem: Race condition ! This is not a correct concurrent program Accesses to the same shared mem (variable count) should be protected

  14. Example: Trial solution 2 • Correct previous trial solution by adding mutex locks in order to prevent concurrent accesses to shared variable count • Code of each task: mutex m; for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) { mutex_lock(m); count ++; mutex_unlock(m); } } Problem: VERY slow ! There is no real parallelism, tasks wait after each other all the time

  15. Example: Trial solution 3 • Each processor adds into its own private counter, combine partial counts at the end • Code of each task: for (i=start; i<start+length_per_thread; i++) { if (array[i] == 3) { private_count [id] ++; } mutex_lock(m); count+=private_count[id]; mutex_unlock(m); Problem: STILL no speedup measured when using more than 1 processor ! Reason: false sharing

  16. Example: false sharing

  17. Example: solution 4 • Forcing each private counter to be on a separate cache line, by “padding” them with “unused” locations struct padded_int { int value; char padding[128]; } private_count[MaxThreads]; Finally a speedup is measured when using more than 1 processor ! Conclusion: producing correct and efficient parallel programs can be considerably more difficult than writing correct and efficient serial programs !!!

  18. Sequential vs parallel programming • Has different costs, different advantages • Requires different, unfamiliar algorithms • Must use different abstractions • More complex to understand a program’s behavior • More difficult to control the interactions of the program’s components • Knowledge/tools/understanding more primitive

  19. Goals of Parallel Programming • Performance: Parallel program runs faster than its sequential counterpart (a speedup is measured) • Scalability: as the size of the problem grows, more processors can be “usefully” added to solve the problem faster • Portability: The solutions run well on different parallel platforms

More Related