180 likes | 294 Vues
This presentation explores the capabilities of Chapel and X10 as high-productivity programming languages in parallel computing in contrast to the traditional MPI approach. It highlights language features that enhance programmer productivity, supported by case studies and language examples. We discuss data distribution models, communication paradigms, and synchronization methods, demonstrating how these languages can simplify complex parallel programming tasks and improve overall performance in high-performance computing environments.
E N D
High Productivity Languages for Parallel Programming Compared to MPI Scott Spetka – SUNYIT and ITT Corp Haris Hadzimujic – SUNY Institute of Technology Stephen Peek – Binghamton University Christopher Flynn – Air Force Research Laboratory, Information Directorate HPC Users Group Conference Seattle, WA July 15 – 17, 2008
Introduction to Chapel, X10, MPI • Pub/Sub Case Study • Language Examples • Conclusion Outline
DoD HPCS • Improved Programmer Productivity • Last Chapel Release March 2008 version 0.775 – remote processing support http://chapel.cs.washington.edu/ • New X10 language report came out version 1.7 – June 18, 2008
Data Distribution - Global partitioned address Space • Communication Model – One-sided/two-sided • Synchronization – Sync variables (Chapel) Clocks (X10) Atomic Sections (Both) • Parallel Threads – Async, Futures (X10), Cobegin (Chapel) • Performance – Prototypes demonstrate features - 2010 Introduction to Chapel, X10, MPI
1 var n : int = 1000; 2 var A, B: [ 1 . . n ] float ; 3 forall i in 2 . . n−1 4 B( i ) = (A( i − 1) + A( i + 1 ) ) / 2 ; 1 var n : int = 1000; 2 var locN : int = n / numTasks ; 3 var A, B: [ 0 . . locN +1] float ; 4 var myItLo : int = 1 ; 5 var myItHi : int = locN ; 6 if ( iHaveLeftNeighbor ) then 7 send ( left , A( 1 ) ) ; 8 else 9 myItLo = 2 ; 10 if ( iHaveRightNeighbor ) { 11 send ( right , A( locN ) ) ; 12 recv ( right , A( locN + 1 ) ) ; 13 }e l s e 14 myItHi = locN−1; 15 if ( iHaveLeftNeighbor ) then 16 recv ( left , A( 0 ) ) ; 17 forall i in myItLo . . myItHi do 18 B( i ) = (A( i −1) + A( i +1 ) ) / 2 ; PGAS vs. Fragmented International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima JPL, U of Vienna, Austria
Global vs Local View International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima JPL, U of Vienna, Austria
Pub/Sub Model Pub/Sub Introduction Publisher - Publish XML documents Pubcatcher – Publication input to brokers Subscriber – Submit XPATH subscriptions Broker – Match subscriptions against pubs
Pub/Sub Model Pub/Sub Model - PGAS
Pub/Sub Model Pub/Sub Model - Fragmented
Chapel type elemType = int(32); config const numPublishers = 2, numBrokers = 2, bufferSize=12; const ProblemSpace: domain(1) distributed(Cyclic) = [0..bufferSize-1]; var buff: [ProblemSpace] elemType; var nextFreeSlot$: sync int = 1; var nextFullSlot$: sync int = 1; def main() { cobegin { coforall i in 1..numPublishers { publisher(i); } coforall i in 1..numBrokers { broker(i); } } }
Chapel def publisher(id: int) { var pub = infile.read(int); for slot in getNextFreeSlot() { writeln("Publisher:", id, " published:", pub, " in slot:",slot); buff(slot) = pub; sleep(3); pub = infile.read(int); } }
Chapel def getNextFreeSlot() { // Access the next free message queue slot while (1) { const locFree = nextFreeSlot$; // consume sync var const nextFree = (locFree + 1) % bufferSize; if (nextFree == nextFullSlot$.readXX()) { // we wrapped around so don't yield anything, but allow others to // continue by refilling the sync var with the same value nextFreeSlot$ = locFree; } else { nextFreeSlot$ = nextFree; // refill sync var with advanced value yield locFree; // yield the free slot that we grabbed } } }
X10 // Declaration of global one dimensional array that will be distributed // Cyclic distribution definition using region of A for distribution scope final static int [.] A = new int [[1:8]] (point[i]) { return i*10; }; final static dist d = dist.factory.cyclic(A.region); public static void main(String args[]) { System.out.println("\n\nTotal places: "+ place.MAX_PLACES + "\n"); System.out.println( "ID of the distribution: " + here + "\n"); finish ateach (final point p: d ) { System.out.println( "Execution place: "+ d[p] + " and value: " + A[p]); } subscription(1); subscription(2); } // end main
static void subscription(final int i) { foreach(point p : d) { async (d.distribution[p]) { switch (i) { case 1: if(A[p]>40) { A[p]=A[p]+1; System.out.println(“Location " + here + " value" + A[p]); } case 2: if(A[p]<40) { A[p]=A[p]-1; System.out.println(“Location " + here + " “value" + A[p]); } default: break; } // switch } // async } // foreach } // subscription X10
MPI //get attribute to determine if current process is to store data MPI_Attr_get(next_comm, NEXT, &next_store_ptr, &flag); MPI_Allreduce(next_store_ptr, &next_rank, 1, MPI_INT, MPI_MAX, next_comm); next_rank = next_rank % size;
MPI if (my_rank == next_rank){ MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); if ((next_rank+1)<size){ *next_ptr = next_rank + 1; } else{ *next_ptr = next_rank + 2; } MPI_Attr_put(next_comm, NEXT, next_ptr); printf("stored on process %i\n", next_rank); MPI_Recv(&data_recv, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status); data_store[count][0] = data_recv; data_store[count][1] = status.MPI_TAG; count++; }
Conclusion HPCS languages reduce time to solution Object Oriented – user-defined distributions, reductions, scans Global Synchronization One-sided communication Adding new tasks
Acknowledgements Bradford Chamberlain, Cray Igor Peshansky, IBM