X10: IBM’s bid into parallel languages

X10: IBM’s bid into parallel languages Paul B Kohler Kevin S Grimaldi University of Massachusetts Amherst

introduction • A new language based of Java • IBM’s entry to the DARPA’s PERCS project (Productive Easy-to-use Reliable Computer Systems) • Built for NUCCs(Non-Uniform Computing Clusters) where different memory locations incur different cost.

intro continued • Will eventually be combined with new tools for Eclipse • Goals • Safe • Analyzable • Scalable • Flexible

PGAS • Past attempts at parallel languages have used the illusion of a single shared memory • This does not represent the situation in NUCC. • Problems occur when we try divide memory among processors. • X10 uses PGAS to reveal the non-uniformity and make the language scalable.

PGAS(co nt) • PGAS=Partitioned Global Address Space • Memory partitioned into places. Data is associated with a place and can only be read/changed locally. • Provided in X10 through the abstractions of places and activities.

Places • Contain a collection of resident mutable data objects and associated activities • Places represent locality boundaries • Very efficient access to resident data • Set of places remains fixed at runtime • Places are virtual • Mapped to physical processors by runtime • Runtime may transparently migrate places

Using Places • Accessible via place.places • First activity runs at place.FIRST_PLACE • Iterate over places with next() and prev() • here represents current place

Activities • Similar to java threads. • Activities are associated with a place. • Activities never migrate places. • Activities may only read/modify mutable data that is local to its place. • However immutable data (i.e.final or value) maybe accessed by any activity.

Activities (cont) • Activities are GALS(Globally Asynchronous Locally Synchronous) • Local data accesses are synchronized • Global data accesses are not by default. Synchronization can be explicitly forced.

Activities:Syntax • It is very simple to spawn new activities: async(place)statement • This runs the specified statement at the specified place. • Example: • final int result; async(here.next()){result=a+b} This would add two numbers at the adjacent place and store the result(since result is final it can be accessed by other places)

Type System • X10 is strongly typed • Unified type system • Everything is an object; no primitive types • Library supplies boolean, byte, short, char, int, long, float, double, complex, String classes • Borrows Java’s single inheritance combined with interfaces

Reference vs Value Types • Two types of objects • Value types are immutable and can be freely copied • Reference types can contain mutable fields but cannot be migrated • Value classes are declared value keyword instead of class • Value classes can still contain fields that are of reference types • Allows them to refer to mutable data • Copying ‘bottoms out’ on reference fields

Type System (cont) • Objects are either scalar or aggregate • Each of value and reference types can be either scalar or aggregate • Types consist of two parts • Data type – The set of values it can take • Place type – The place at which it resides • No generics (yet)

Variables • Variables must be initialized (can never be observed without a value) • final variables cannot be changed after initialization • Declared by using the final keyword and/or using a variable name that starts with a capital letter

Nullable Types • Designers view ability to hold null value as orthogonal to value vs reference type • Either reference or value types can be preceded by nullable • Adds a null value to the type • Multiple nullables are collapsed (i.e. nullable nullable T = nullable T) • Can cast between T and nullable T • (nullable T) v always succeeds • (T) null throws an exception if T is not nullable

Rooted exceptions • What should happen when a thread/activity terminates abnormally? • In java it’s unclear since the spawning thread may have already terminated. • X10 uses a rooted exception model. All uncaught exceptions get passed to the calling activity. • A new blocking command finish s is introduced. This command waits for all activities in s to terminate before proceeding.

Exceptions (cont) • Finish allows exceptions to travel back towards the root activity and possibly be caught and handled along the way. • Example: try{ finish async(here.next()){ throw new Exception(); } } catch(Exception e){ }

Arrays • X10 features an array sub-language similar to ZPL. • Arrays have: • Regions • Distributions • Arrays are operated on by: • for • foreach • ateach • And more!

Even more arrays • Arrays may be value(immutable) or reference(mutable) • Keyword unsafe allows arrays that will play nice with java code. • Arrays can run code as an initialization step.

Arrays:Regions • Regions:As in ZPL a region is a set of indexed data points. • Regions and distributions are first class constructs. • Regions can be specified like this: • [0:128,0:256] creates a region 128x256

Regions(cont.) • Regions can be modified by operation such as union(||), intersection(&&) and set difference(-). • Predefined regions types can be constructed using factories. region R2 = region.factory.upperTriangular(25) • In the future users may be able to define there own regions.

Arrays:Distributions • Every array has a distribution. • A distribution is mapping of array elements to places. • Distributions are over a particular region. • Arrays are typed by their distribution.

Distributions cont. • Currently must use pre-defined distributions(unique,block,cyclic…etc.) • Have set operations like regions. • Can be used as functions so for a point p and distribution d: d[p]=place which point p maps to(i.e. where the p’th element “lives”).

Subarrays • Use various boolean operations on distributions to create subdistributions • To get the portion of a block distribution that is located here: block([1:100]) && [1:100]->here • a | D1 is the portion of array a corresponding to the subdistribution D1

Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j};

Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region

Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region. • This specifies a cyclic distribution over the region.

Array construction • Here is an example of array initialization: float [.] data= new[factory.cyclic([0:200,50:250])] (point [i, j]){return i+j}; • This specifies a 200x200 region. • This specifies a cyclic distribution over the region. • This code initialize each element to the some of its i,j coordinates

Array iteration • Once you have an array what can you do with it? • Array iterators: for, foreach, ateach • for: Sequentially iterates over a supplied region. At each point it binds the point to a variable and executes the accompanying statement. • foreach: As with for but operations are done in parallel. That is it spawns a new activity for each point. • ateach: takes a distribution instead of a region. Performs operations in parallel at the place specified by the distribution.

Iteration example • Example: for(point p : A){ A[p]=A[p]*A[p] }

More array ops • lift: Takes a binary function and two arrays of the same distribution. Produces a new array formed by a pointwise application of the function to the two arrays. • reduce: As in MPI applies a binary function to every element to produce a single value. • scan: Creates a new array where the i’th element is the result of reduction on the first i elements.

Atomic Blocks • X10 allows you to define atomic blocks • The contents of a block is guaranteed to execute as a single atomic event. This is only in regards to other activities in the same place. • While this is guaranteed to be atomic the details are implementation specific. • Syntax: atomic S

Conditional Atmc Blck • Also provides: when(Cond) S • This blocks until cond is true and then executes S atomically. • This allows the creation of a number of synchronization mechanisms. • Dangerous! If cond is never true or if there is a cycle deadlock occurs.

Future and Force • As discussed before futures allow the asynchronous computation of a value that may be used in the future. • Futures return a object of type Future<T> • Force is a blocking call that waits for a particular future to be finished

Futures(cont.) • Can only access final variables. This prevents side effects. • Syntax: future(p)e • Example: Future <float> blah = future(here.next){sqrt(a^2+b^2)};

Clocks • Act as barriers • Much more flexible • Guarantee no deadlock • Dynamically associated with different sets of activities

Clock Semantics • Activities register with zero or more clocks • Can register/unregister at any time • Clocks are always in some phase • Do not advance until allcurrently registered activities quiesce • Activities quiesce with next operation • Indicates they are ready for all their clocks to advance • Suspends until all clocks have advanced • This makes deadlock impossible

Status • IBM has supposedly built a single VM reference implementation • Language still under heavy revision • GPL’ed X10-XTC compiler available • Doesn’t conform to current language spec • Uses what will possibly be version 0.5 • Speculatively contains support for operator overloading and generics • Currently very poor performance

conclusion • So is X10 the answer to all our parallel programming woes?

conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not.

conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc.

conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc. • Takes a “…and the kitchen sink” approach which makes learning the syntax a chore.

conclusion • So is X10 the answer to all our parallel programming woes? • In my opinion probably not. • Parallelism still very explicit. Still opportunities for deadlock, race conditions etc. • Takes a “…and the kitchen sink” approach which makes learning the syntax a chore. • It’s not FORTRAN. Will people bother to use it?

X10: IBM’s bid into parallel languages