1 / 35

Parallel Programming

Parallel Programming. Introduction. Idea has been around since 1960’s pseudo parallel systems on multiprogram-able computers True parallelism Many processors connected to run in concert Multiprocessor system Distributed system stand-alone systems connected

mark-olsen
Télécharger la présentation

Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming

  2. Introduction • Idea has been around since 1960’s • pseudo parallel systems on multiprogram-able computers • True parallelism • Many processors connected to run in concert • Multiprocessor system • Distributed system • stand-alone systems connected • More complex with high-speed networks

  3. Programming Languages • Used to express algorithms to solve problems presented by parallel processing systems • Used to write OSs that implement these solutions • Used to harness capabilities of multiple processors efficiently • Used to implement and express communication across networks

  4. Two kinds of parallelism • Existing in underlying hardware • As expressed in programming language • May not result in actual parallel processing • Could be implemented with pseudo parallelism • Concurrent programming – expresses only potential for parallelism

  5. Some Basics • Process • An instance of a program or program part that has been scheduled for independent execution • Heavy-weight process • full-fledged independent entity with all the memory and other resources that are ordinarily allocated by OS • Light-weight process or thread • shares resources with program it came from

  6. Primary requirements for organization • Must be a way for processors to synchronize their activities • 1st processor input and sorts data • 2nd processor waits to perform computations on sorted data • Must be a way for processors to communicate data among themselves • 2nd processor needs data

  7. Architectures • SIMD (single-instruction, multiple-data) • One processor is controller • All processors execute same instructions on respective registers or data sets • Multiprocessing • Synchronous (all processors operate at same speed) • Implicit solution to synchronization problem • MIMD (multiple-instruction, multiple-data) • All processors act independently • Multiprocessor or distributed processor systems • Asynchronous (synchronization critical problem)

  8. OS requirements for Parallelism • Means of creating and destroying processes • Means of managing the number of processors used by processes • Mechanism for ensuring mutual exclusion on shared-memory systems • Mechanism for creating and maintaining communication channels between processors on distributed-memory systems

  9. Language requirements • Machine independence • Adhere to language design principles • Some languages use shared-memory model and provide facilities for mutual exclusion through a library • Some assume distributed-memory model and provide communication facilities • A few include both

  10. Common mechanisms • Threads • Semaphores • Monitors • Message passing

  11. 2 common sample problems • Bounded buffer problem • similar to producer-consumer problem • Parallel matrix multiplication • N3 algorithm • Assign a process to compute each element, each process on a separate processor  N steps

  12. Without explicit language facilities • One approach is not to be explicit • Possible in some functional, logical, and OO languages • Certain inherent parallelism implicit • Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program • Suboptimal

  13. Another alternative without explicit language facilities • Translator offers compiler options to allow explicit indicating of areas where parallelism is called for. • Most effective in nested loops • Example: Fortran

  14. m_set_procs –sets the number of processes share – access by all processes local – local to process compiler directive synchronizes the processes, all processes wait for entire loop to finish; one process continues after loop integer a(100, 100), b(100, 100), c(100,100) integer i, j, k, numprocs, err numprocs = 10 C code to read in a and b goes here err = m_set_procs (numprocs) C$doacross share (a, b, c), local (j, k) do 10 i = 1, 100 do 10 j = 1, 100 c(i,j) = 0 do 10 k = 1, 100 c(i, j) = c(i,j) + a(i, k) * b (k, j) 10 continue call m_kill_procs C code to write out c goes here end

  15. 3rd way with explicit constructs • Provide a library of functions • This passes facilities provided by OS directly to programmer • (This is the same as providing it in language) • Example: C with library parallel.h

  16. m_set_procs –creates the 10 processes, all instances of multiply #include <parallel.h> #define size 100 #define NUMPROCS 10 shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE] void multiply (void) { int i, j, k; for (i=m_get_myid(); i < SIZE; i += NUMPROCS) for (j=0; j < SIZE; j++) for (k=0; k < SIZE; k++) c(i, j) += a(i, k) * b (k, j); } main () { int err; // code to read in a and b goes here m_set_procs (NUMPROCS); m_fork (multiply); m_kill_procs (); // C code to write out c goes here return 0; }

  17. 4th final alternative • Simply rely on OS • Example: • pipes in Unix OS ls | grep “java” • runs ls and grep in parallel • output of ls is piped to grep

  18. Language with explicit mechanism • 2 basic ways to create new processes • SPMD (single program multiple data) • split the current process into 2 or more that execute copies of the same program • MPMD (multiple program multiple data) • a segment of code associated with each new process • typical case fork-join model, in which a process creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join) • last example similar, but m_kill_procs takes place of join

  19. Granularity • Size of code assignable to separate processes • fine-grained: statement-level parallelism • medium-grained: procedure-level parallelism • large-grained: program-level parallelism • Can be an issue in program efficiency • small-grained: overhead • large-grained: may not exploit all opportunities for parallelism

  20. Thread • fine-grained or medium-grained without overhead of full-blown process creation

  21. Issues • Does parent suspend execution while child processes are executing, or does it continue to execute alongside them? • What memory, if any, does a parent share with its children or the children share among themselves?

  22. Answers in Last example • parent process suspended execution • indicate explicitly global variables shared by all processes

  23. Process Termination • Simplest case • a process executes its code to completion then ceases to exist • Complex case • process may need to continue executing until a certain condition is met and then terminate

  24. Statement-Level Parallelism (Ada) parbegin S1; S2; … Sn; parend;

  25. Statement-Level Parallelism (Fortran95) FORALL (I = 1:100, J=1:100) C(I,J) = 0; DO 10 K = 1,100 C(I,J) = C(I,J) + A(I,k) * B(K,j) 10 CONTINUE END FORALL

  26. Procedure-Level Parallelism (Ada) x = newprocess(p); … … killprocess(x); • where p is declared procedure and x is a process designator • similar to tasks in Ada

  27. Program-Level Parallelism (Unix) • fork creates a process that is an • exact copy of calling process if (fork ( ) == 0) { /*..child executes this part */} else { /* ..parent executes this part */} • a returned 0-value indicates process is the child

  28. Java threads • built into Java • Thread class part of java.lang package • reserved word synchronize • establish mutual exclusion • create an instance of Thread object • define its run method that will execute when thread starts

  29. Java threads • 2 ways (I’ll show you second more versatile way) • Define a class that implements Runnable interface (define run method) • Then pass an object of this class to the Thread constructor • Note: Every Java program is already executing inside a thread whose run method is main.

  30. Java Thread Example class MyRunner implements Runnable { public void run() { … } } MyRunner m = new MyRunner (); Thread t = new Thread (m); t.start (); //t will now execute the run //method

  31. Destroying threads • let each thread run to completion • wait for other threads to finish t.start (); //do some other work t.join () //wait for t to finish • interrupt it t.start (); //do some other work t.interrupt() //tell t we are waiting… t.join () //wait for t to finish

  32. Mutual exclusion class Queue { … synchronized public Object dequeue () { if (empty()) throw … } synchronized public Object enqueue (Object obj) { … } … }

  33. Mutual exclusion class Remover implements Runnable { public Remover (Queue q) { ..} public void run( ) { …q.dequeue() …} } class Insert implements Runnable { public Insert (Queue q) {…} public void run () { …q.enqueue (…) …} }

  34. Mutual exclusion Queue myqueue = new Queue(..); … Remover r = new Remover (q); Inserter i = new Insert (q); Thread t1 = new Thread (r); Thread t2 = new Thread (i); t1.start(); t2.start();

  35. Manually stalling a thread and then reawakening it class Queue { … synchronized public Object dequeue () { try { while (empty()) wait(); } catch (InterruptedException e) //reset interrupt { … } } synchronized public Object enqueue (Object obj) { … notifyAll(); } … }

More Related