Survey of Programming Models for Data Oriented Grid Computing

Survey of Programming Modelsfor Data Oriented Grid Computing Douglas Thain dthain@nd.edu University of Notre Dame 1 November 2007

Data Oriented Programming Models • Overview and Challenges • Examples of Languages • DAG Oriented • Database Oriented • Abstraction/Pattern Oriented • Current Work at Notre Dame • Assembly: Chirp • Abstraction: All Pairs • Language: DataLab • Ruminations on Language Requirements

Overview • Survey of models for expressing large data intensive workloads, typically constructed by assembling sequential components together. • Some challenges are similar to CPU parallelism: • How does the user express parallelism? • Can the system discover and exploit parallelism? • What is the optimal decomposition of a problem? • Some challenges are particular to data: • System state persists across executions. • Component behavior is not well specified. • Bad decisions can result in 1000x slowdown. • Thus, goal is usually to avoid awful cases.

Commonalities • Most data-oriented languages are declarative rather than imperative. • Why is this necessary? • Enormous number of failure modes. • User doesn’t want to know the whole ugly story. • Primitive operations are persistent. • Batch job has lifetime independent of submitter. • Probability of coordinator failing is high. • Need transactions, leases, and logging to recover. • System bears the responsibility of cleanup. • Cannot simply Ctrl-C the coordinator cleanly.

From Previous Talks • Languages • The system provides a set of primitive operations that the user may combine together in many ways. • The system may optimize certain cases, but cannot predict all uses, so the programmer must be careful. • Abstractions or Patterns • The system provides a very restricted interface and the user can only solve problems that fit. • The system can provide a very good implementation of the restricted case, so the user can be naive. • Obviously, there is a continuum between the two extremes. Most grid languages tend closer to the abstraction side of the graph.

DAG Oriented Languages

DAGMan JOB A a.submit JOB B b.submit JOB C c.submit JOB D d.submit PARENT A CHILD B PARENT B CHILD C PARENT B CHILD D A B execute jobs submit dag CPU submit jobs C D DAGMan Condor CPU CPU LOG job status Douglas Thain and Miron Livny, “Condor and the Grid” in Berman, Hey, and Fox, “Grid Computing: Making the Global Infrastructre a Reality, John Wiley, 2003.

Data Dependencies Control dependencies are almost always expressible as data dependencies. If the system is aware of the data interactions, it can protect limited resources, make better scheduling decisions, and be more robust to failures. Example: Don’t stage out intermediate files, leave them in place for next execution; if lost, re-execute the creator. A data B data data C D

BAD-FS INPUT JOB A a.submit JOB B b.submit JOB C c.submit PARENT A CHILD B PARENT B CHILD C VOLUME INPUT source-url VOLUME S1 scratch VOLUME S2 scratch MOUNT INPUT A /data MOUNT INPUT D /data MOUNT S1 A /tmp MOUNT S2 D /tmp EXTRACT S1 out.dat target-url EXTRACT S2 out.dat target-url A D B E S1 S2 C F out.dat out.dat John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny, "Explicit Control in a Batch Aware Distributed File System", NSDI 2004.

Pegasus Concrete DAG Abstract DAG transfer to gridftp://server/data in1 A A Single Pass Translation to DAGMan transfer to gridftp://other/data temp B B temp temp /tmp/data1 /tmp/data2 C D C D out1 out2 transfer to gridftp://home/out1 transfer to gridftp://home/out2 Ewa Deelman et al, “Pegasus: Mapping Scientific Workflows onto the Grid”, Scientific Programming Journal, Voume 13, number 3, 2005.

Dryad output fold Iterative construction in C++: GraphBuilder X = grep^A; GraphBuilder Y = grep^B; GraphBuilder Z = X <= Y; GraphBuilder S = sort^C; GraphBuilder F = fold(Z,S); cmp grep grep sort A B C M. Isard, M Budiu, Y. Yu, A. Birrell, D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks”, Eurosys 2007.

Database Oriented Languages • In a given field, many researchers exploit a common toolchain with many different inputs in order to explore a param space. • Idea: Represent programs as standard transformations from one data space to another. Store all results in a database. • Virtual Data: User simply performs a query in the target space, and doesn’t care whether results are computed or stored.

Note: This is not the real Chimera syntax, it has been simplified for clarity. Chimera Transformation Database Derivation Database TR simulate( input p, output a ) { exec “sim.exe –temp $p >$a”; } TR analyze( input b, output c ) { exec “analyze.exe $b >$c”; } TR runexpt( input p, output d ) { file temp; simulate(p,temp); analyze(temp,d); } DV runexpt( 30, file ) DV runexpt( 20, file ) DV runexpt( 10, file ) DV runexpt( 10, file ) would return existing data. DV runexpt( 15, file ) would execute code and then return the data Ian Foster, Jens Vöckler, Michael Wilde, Yong Zhao, “Chimera: A Virtual Data System For Representing, Querying, and Automating Data Derivation”, 2002.

GridDB • Same NSF project, except: • SQL is the interaction language. • Separate pushing of inputs from output query. input table output table sim.exe SELECT INSERT David Liu, Michael Franklin, “GridDB: A Data Centric Overlay for Scientific Grids”, VLDB 2004.

Swift Derived from Chimera, with three key differences: - Much improved syntax (IMHO) - Complex data types. - Full program text and state stored in file system. (Run or) reorientRun (Run ir, string dir, string ov ) { foreach Volume iv, i in ir.v { or.[i] = reorient(iv,dir,ov); } } Y. Zhao, et al, “Swift: Fast, Reliable, Loosely Coupled Parallel Computation”, IEEE International Workshop on Scientific Workflows, 2007

Abstraction/Pattern Languages • A single system structure is suitable for solving a wide variety of problems. • Often, the code for the system structure is far more complicated than the application. • Solution: Let the user provide a few snippets of code to embed in a larger class or pattern.

Master-Worker MW Complete Results Worker Master Worker Worker Worker Worker Worker add work units Worker Worker Worker work queue Work Assignments Worker Used to attack brute-force optimization problems. 100,000s of CPUs in BOINC, Folding@Home, etc...

Master-Worker void master() { queueWorkUnit( base_case ); while( r = getNextResult() ) { if( appl condition ) { queueWorkUnit( more ); } else { printResult(r); } } } void worker() { while(1) { u = getNextWorkUnit(); r = application work; transmitResult(r); } } Implemented on Condor/Condor-G using PVM/files/sockets for communication. Goux et al, “An enabling framework for Master-Worker applications on the Computational Grid”, HPDC 2000.

Map-Reduce inputs: (file,word) intermediates (word,count) output: (word,count) nouns map doc verbs unique nouns reduce nouns map doc verbs unique verbs reduce nouns map doc verbs Sample Application: Identify all unique nouns and verbs in 1M documents

Map-Reduce class MyMR : Map Reduce { void map( Enum tuples ) { foreach (k,v) in tuples { kind = NounOrVerb(v); EmitIntermediate(kind,v); } } void reduce( Key key ,Enum vals ) { foreach v in vals { total ++; } Emit( key, total ); } }; Implemented on the Google infrastructure, hiding problems such as data co-location, failure, stragglers. Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”. OSDI 2004.

F All-Pairs Image Comparison Current Workload: 4000 images 256 KB each 10s per F Future Workload: 60000 images 1MB each 1s per F

Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS. F F F F F CPU CPU CPU CPU CPU F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN HN Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once. Try 4: User gives up and attempts to solve an easier or smaller problem. F F F F F F F F F F CPU F CPU F CPU F CPU F CPU F F F F F F HN Non-Expert User Using 500 CPUs

All Pairs Abstraction binary function F set S of files F invocation M = AllPairs(F,S) Chris Moretti, Jared Bulosan, Douglas Thain, and Patrick Flynn, “All-Pairs: An Abstraction for Data Intensive Computing”, under review, 2007.

All Pairs Production System 300 active storage units 500 CPUs, 40TB disk Web Portal F G H 4 – Choose optimal partitioning and submit batch jobs. S T F F F 1 - Upload F and S into web portal. 2 - AllPairs(F,S) F F F All-Pairs Engine 6 - Return result matrix to user. 5 - Collect and assemble results. 3 - O(log n) distribution by spanning tree.

Initial Results on Real Workload

Current Work onProgramming Active Storage

Layers of Language Design • A programming environment consists of several layers of concepts: • Assembly language: A fundamental set of operations that define and constrain the possible programs. (load, store, add...) • Abstractions: Groupings of operations that express the most common idioms employed by end users. (stacks, functions, arrays) • Language: A concrete syntax that compactly represents the abstractions of the language. a[x]*f(x);

Assembly Language • Array of active storage servers that combine basic data storage with remote execution. • Data Operations: • open, read, write, close, getdir, unlink, stat, ... • getacl, setacl, getfile, putfile • CPU Operations: • job_begin – create a new job, return the txn # • job_commit – enable the job to execute • job_wait – wait for the job to reach a final state • job_kill – force the job into a final state • job_remove – remove state associated with the job • Using our own implementation gives us very precise control over the system semantics.

http://www.cse.nd.edu/~ccl/viz

Abstractions file system distributed data structures function evaluation tcsh emacs perl set S file F Y = F(X) A B C job_start job_commit job_wait job_remove parrot chirp server chirp server chirp server chirp server chirp server unix filesys unix filesys unix filesys F X Y

Language Syntax: DataLab apply F on S into T set S set T F A B C A B C chirp server chirp server chirp server chirp server chirp server F F F

Ruminations • What is unique about a programming language for large scale data intensive computing? • Manipulates remote persistent state. • Likely to compete with others for resources. • Encounters an insane set of failure modes. • Two very distinct purposes: • Constructing new kinds of systems with novel concurrency and data access patterns? (Imperative) • Harnessing existing systems within certain well known patterns of interactions? (Declarative) • Either way, need to choose the assembly language very carefully!

Properties of Assembly Language • Need a transactional interface for manipulating remote persistent state. • Recover from network failures. • Recover from coordinator failure. • Precise cancellation of long-running ops. • Persistent storage for program state. • Need a place to store transaction #s. • Allows for fast failure recovery without scanning all participants. (i.e. avoid fsck.) • Simplifies debugging, monitoring, auditing. • Precise semantics under failure conditions.

Discussion Topics? • Assertion: Getting the semantics of the assembly language right is more important than the syntax of the language correct. ??? • Creating robust algorithms is too much to ask of the end user. Therefore: declarative for end users, imperative for system builders. ??? • Creating robust algorithms is too hard to solve in the general case. Therefore: expose sophisticated controls that allow the end user to make the right decisions. ???

Survey of Programming Models for Data Oriented Grid Computing