Maximizing Java-Based Parallel Computing Efficiency for Large Applications with Javelin 2.0

Java-Based Parallel Computing on the Internet: Javelin 2.0 & BeyondMichael Neary & Peter CappelloComputer Science, UCSB

IntroductionGoals • Service parallel applications that are: • Large: too big for a cluster • Coarse-grain: to hide communication latency • Simplicity of use • Design focus: decomposition [composition] of computation. • Scalable high performance • despite large communication latency • Fault-tolerance • 1000s of hosts, each dynamically [dis]associates.

IntroductionSome Related Work

IntroductionSome Applications • Search for extra-terrestrial life • Computer-generated animation • Computer modeling of drugs for: • Influenza • Cancer • Reducing chemotherapy’s side-effects • Financial modeling • Storing nuclear waste

Outline • Architecture • Model of Computation • API • Scalable Computation • Experimental Results • Conclusions & Future Work

Architecture Basic Components Clients Brokers Hosts

Architecture Broker Discovery B B B Broker Naming System B B B H B B B

Architecture Broker Discovery B B B Broker Naming System B B B H B B B PING (BID?)

Architecture Broker Discovery B B B Broker Naming System B B B H B B B

ArchitectureNetwork of Broker-Managed Host Trees • Each broker manages a tree of hosts

ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network

ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker

ArchitectureNetwork of Broker-Managed Host Trees • Brokers form a network • Client contacts broker • Client gets host trees

Scalable ComputationDeterministic Work-Stealing Scheduler addTask( task ) getTask( ) Task container stealTask( ) HOST

Scalable ComputationDeterministic Work-Stealing Scheduler Task getWork( ) { if ( my deque has a task ) return task; else if ( any child has a task ) return child’s task; else return parent.getWork( ); } CLIENT HOSTS

Models of Computation • Master-slave • AFAIK all proposed commercial applications • Branch-&-bound optimization • A generalization of master-slave.

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 0 0

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 2 0 2

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER =  LOWER = 3 0 2 3

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 4 LOWER = 4 0 2 3 4

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 3 0 2 3 4 3

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Models of ComputationBranch & Bound UPPER = 3 LOWER = 6 0 2 3 6 4 3

0 0 7 2 7 2 3 8 6 10 3 6 4 3 8 7 12 10 9 10 4 3 Models of ComputationBranch & Bound UPPER = 3 LOWER = 7

0 7 2 3 6 4 3 Models of ComputationBranch & Bound • Tasks created dynamically • Upper bound is shared • To detect termination: scheduler detects tasks that have been: • Completed • Killed (“bounded”)

API public class Host implements Runnable { . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) jDM.addWork( child[i] ); //else child is killed implicitly } } }

API private void compute() { . . . boolean newBest = false; while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best ); } }

Scalable ComputationWeak Shared Memory Model • Slow propagation of bound affects performance not correctness. Propagate bound

Scalable ComputationFault Tolerance via Eager Scheduling When: • All tasks have been assigned • Some results have not been reported • A host wants a new task Re-assign a task! • Eager scheduling tolerates faults & balances the load. • Computation completes, if at least 1 host communicates with client.

0 7 2 3 6 4 3 Scalable ComputationFault Tolerance via Eager Scheduling • Scheduler must know which: • Tasks have completed • Nodes have been killed • Performance  balance • Centralized schedule info • Decentralized computation

Experimental Results

0 7 2 3 8 6 10 4 3 8 7 12 10 9 10 Experimental Results Example of a “bad” graph

Conclusions • Javelin 2 relieves designer/programmer managing a set of [Inter-] networked processors that is: • Dynamic • Faulty • A wide set of applications is covered by: • Master-slave model • Branch & bound model • Weak shared memory performs well. • Use multicast (?) for: • Code distribution • Propagating values

Future Work • Improve support for long-lived computation: • Do not require that the client run continuously. • A dag model of computation • with limited weak shared memory.

Future WorkJini/JavaSpaces Technology “Continuously” disperse Tasks among brokers via a physics model H H H TaskManager aka Broker H H H H H

Future WorkJini/JavaSpaces Technology • TaskManager uses persistent JavaSpace • Host management: trivial • Eager scheduling: simple • No single point of failure • Fat tree topology

Future WorkAdvanced Issues • Privacy of data & algorithm • Algorithms • New computation-communication complexity model • N-body problem, … • Accounting: Associate specific work with specific host • Correctness • Compensation (how to quantify?) • Create open source organization • System infrastructure • Application codes

Maximizing Java-Based Parallel Computing Efficiency for Large Applications with Javelin 2.0