Dryad and dataflow systems

Dryad anddataflow systems Michael Isard misard@microsoft.com Microsoft Research 4thJune, 2014

Talk outline • Why is dataflow so useful? • What is Dryad? • An engineering sweet spot • Beyond Dryad • Conclusions

Computation on large datasets • Performance mostly efficient resource use • Locality • Data placed correctly in memory hierarchy • Scheduling • Get enough work done before being interrupted • Decompose into independent batches • Parallel computation • Control communication and synchronization • Distributed computation • Writes must be explicitly shared

Computational model • Vertices are independent • State and scheduling • Dataflow very powerful • Explicit batching and communication Outputs Processing vertices Channels Inputs

Why dataflow now? • Collection-oriented programming model • Operations on collections of objects • Turn spurious (unordered) for into foreach • Not every for is foreach • Aggregation (sum, count, max, etc.) • Grouping • Join, Zip • Iteration • LINQ since ca 2008, now Spark via Scala, Java

Given some lines of text, find the most commonly occurring words. Read the lines from a file Split each line into its constituent words Count how many times each word appears Find the words with the highest counts Well-chosen syntactic sugar blue red red,2 intSortKey(KeyValuePair<string,int> x) { return x.count; } intSortKey(void* x) { return (KeyValuePair<string,int>*)x->count; } blue blue Collection<KeyValuePair<string,int>> blue blue,4 yellow Type inference yellow yellow,3 yellow red • var lines = FS.ReadAsLines(inputFileName); • var words = lines.SelectMany(x => x.Split(‘ ‘)); • var counts = words.CountInGroups(); • var highest = • counts.OrderByDescending(x => x.count).Take(10); Lambda expressions FooCollectionFooTake(FooCollection c, int count) { … } Collection<T> Take(this Collection<T> c, int count) { … } Generics and extension methods

Collections compile to dataflow • Each operator specifies a single data-parallel step • Communication between steps explicit • Collections reference collections, not individual objects! • Communication under control of the system • Partition, pipeline, exchange automatically • LINQ innovation: embedded user-defined functions varwords = lines.SelectMany(x => x.Split(‘ ‘)); • Very expressive • Programmer ‘naturally’ writes pure functions

Distributed sorting set varsorted = set.OrderBy(x => x.key) sample compute histogram range partition by key sort locally sorted

Quiet revolution in parallelism • Programming model is more attractive • Simpler, more concise, readable, maintainable • Program is easier to optimize • Programmer separates computation and communication • System can re-order, distribute, batch, etc. etc.

What is Dryad? • General-purpose DAG execution engine ca 2005 • Cited as inspiration for e.g. Hyracks, Tez • Engine behind Microsoft Cosmos/SCOPE • Initially MSN Search/Bing, now used throughout MSFT • Core of research batch cluster environment ca 2009 • DryadLINQ • Quincy scheduler • TidyFS

What Dryad does • Abstracts cluster resources • Set of computers, network topology, etc. • Recovers from transient failures • Rerun computations on machine or network fault • Speculate duplicates for slow computations • Schedules a local DAG of work at each vertex

Scheduling and fault tolerance • DAG makes things easy • Schedule from source to sink in any order • Re-execute subgraph on failure • Execute “duplicates” for slow vertices

Resources are virtualized • Each graph vertex is a process • Writes outputs to disk (usually) • Reads inputs from upstream nodes’ output files • Graph generally larger than cluster RAM • 1TB partitioned input, 250MB part size, 4000 parts • Cluster is shared • Don’t size program for exact cluster • Use whatever share of resources are available

Integrated system • Collection-oriented programming model (LINQ) • Partitioned file system (TidyFS) • Manages replication and distribution of large data • Cluster scheduler (Quincy) • Jointly schedule multiple jobs at a time • Fine-grain multiplexing between jobs • Balance locality and fairness • Monitoring and debugging (Artemis) • Within job and across jobs

Dryad Cluster Scheduling Scheduler R

Dryad Cluster Scheduling Scheduler R R

Quincy without preemption

Quincy with preemption

Dryad features • Well-tested at scales up to 15k cluster computers • In heavy production use for 8 years • Dataflow graph is mutable at runtime • Repartition to avoid skew • Specialize matrices dense/sparse • Harden fault-tolerance

Stateless DAG dataflow • MapReduce, Dryad, Spark, … • Stateless vertex constraint hampers performance • Iteration and streaming overheads • Why does this design keep repeating?

Software engineering • Fault tolerance well understood • E.g., Chandy-Lamport, rollback recovery, etc. • Basic mechanism: checkpoint plus log • Stateless DAG: no checkpoint! • Programming model “tricked” user • All communication on typed channels • Only channel data needs to be persisted • Fault tolerance comes without programmer effort • Even with UDFs

What about statefuldataflow? • Naiad • Add state to vertices • Support streaming and iteration • Opportunities • Much lower latency • Can model mutable state with dataflow • Challenges • Scheduling • Coordination • Fault tolerance

Batch processing Stream processing Graph processing Timely dataflow

BatchingStreaming vs. (synchronous) (asynchronous) • No coordination needed • Aggregation is difficult • Requires coordination • Supports aggregation

Batch DAG execution Central coordinator

Streaming DAG execution      

Streaming DAG execution      Inline coordination

Batch iteration   Central coordinator

Streaming iteration   

Messages B.SendBy(edge, message, time)  B C D C.OnRecv(edge, message, time) Messages are delivered asynchronously

Notifications C.SendBy(_, _, time) D.NotifyAt(time)  B C D D.OnRecv(_, _, time) D.OnNotify(time) Notifications support batching No more messages at time or earlier

Coordination in timely dataflow • Local scheduling with global progress tracking • Coordination with a shared counter, not a scheduler • Efficient, scalable implementation

Interactive graph analysis #x 32K tweets/s @y ⋈ max ⋈ In 10 queries/s z? ⋈

32  8-core 2.1 GHz AMD Opteron 16 GB RAM per server Gigabit Ethernet Query latency Max: 140 ms 99th percentile: 70 ms Median: 5.2 ms

Mutable state • In batch DAG systems collections are immutable • Functional definition in terms of preceding subgraph • Adding streaming or iteration introduces mutability • Collection varies as function of epoch, loop iteration

Key-value store as dataflow var lookup = data.join(query, d => d.key, q => q.key) • Modeled random access with dataflow… • Add/remove key is streaming update to data • Look up key is streaming update to query • High throughput requires batching • But that was true anyway, in general

What can’t dataflow do? • Programming model for mutable state? • Not as intuitive as functional collection manipulation • Policies for placement still primitive • Hash everything and hope • Great research opportunities • Intersection of OS, network, runtime, language

Conclusions • Dataflow is a great structuring principle • We know good programming models • We know how to write high-performance systems • Dataflow is the status quo for batch processing • Mutable state is the current research frontier Apache 2.0 licensed source on GitHub http://research.microsoft.com/en-us/um/siliconvalley/projects/BigDataDev/

Dryad and dataflow systems

Dryad and dataflow systems

Presentation Transcript

Dryad / DryadLINQ

Dataflow Modeling of Signal Processing and Communication Systems

Dataflow Diagrams

Dryad and DryadLINQ

Dryad

Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime

Dryad and DryaLINQ

HIVE-DRYAD Integration

Dataflow Networks

Software and dataflow organization

DataCite , DataONE, Dryad and UC3

Dataflow Monitoring

Dryad and DryadLINQ

Dataflow I: Dataflow Analysis

Dataflow Descriptions

Dryad and DryadLINQ

Dataflow

Dataflow Systems Extensions for Graphs Recursion

UKPMC and Dryad

The Dryad ecosystem

DATAFLOW ARHITEKTURE

Dataflow Datatypes