Piccolo – Paper Discussion Big Data Reading Group

Piccolo – Paper DiscussionBig Data Reading Group 9/20/2010

Motivation / Goals • Rising demand for distributing computation • PageRank, K-Means, N-Body simulation • Data-centric frameworks simplify programming • Existing models (e.g. MapReduce) are insufficient • Designed for large scale data analysis as opposed to in-memory computation • Make in-memory computations fast • Enable asynchronous computation Piccolo – Paper Discussion

Overview • Global in-memory key-value tables for sharing state • Concurrently running instances of kernel applications modifying global state • Locality optimized (user specified policies) • Reduced synchronization (accumulation, global barriers) • Checkpoint-based recovery Piccolo – Paper Discussion

System Design Piccolo – Paper Discussion

Table interface Piccolo – Paper Discussion

Optimization • Ensure Locality • Group kernel with partition • Group partitions • Guarantee: one partition completely on single machine • Reduce Synchronization • Accumulation to avoid write/write conflicts • No pairwise kernel synchronization • Global barriers sufficient Piccolo – Paper Discussion

Load balancing • Assigning partitions • Round robin • Optimized for data location • Work stealing • Biggest task first (master estimates based on number of keys in partition) • Master decides • Restrictions • Cannot kill running task (modifies shared state, restore is very expensive) • Partitions need to be moved Piccolo – Paper Discussion

Table migration • Migrate table from wa to wb • Message M1 from master to all workers • All workers flush to wa • All workers send all new requests to wb • wb buffers all requests • wa sends paused state to wb • All workers ackknowledge phase 1 => master sends M2 to wa and wb • wa flushes to wb and leaves “paused” • wb first works buffered requests then resumes normal operation Piccolo – Paper Discussion

Fault tolerance • User assisted checkpoint / restore • ChandyLamport • Asynchronic -> periodic • Synchronic -> barrier • Problem: When to start barrier checkpoint • Replay log might get very long • Checkpoint might not use enough free CPU time before barrier • Solution: When first worker finished all his jobs • No checkpoint during table migration and vice versa Piccolo – Paper Discussion

Applications • PageRank, k-means, n-body, matrix multiplication • Parallel, iterative computations • Local reads + local/remote writes or local/remote reads + local writes • Can be implemented as multiple MapReduce jobs • Distributed web crawler • Idempotent operation • Cannot be realized in MapReduce Piccolo – Paper Discussion

Scaling Fixed input size Scaled input size Piccolo – Paper Discussion

Comparison with Hadoop / MPI • PageRank, k-means (Hadoop) • Piccolo 4x and 11x faster • For PageRank: • 50% in sort • Join data streams • 15% (de)serialization • Read/write HDFS • Matrix multiplication (MPI) • Piccolo10% faster • MPI waits for slowest node many times Piccolo – Paper Discussion

Work stealing / slow worker / checkpoints • Work stealing / slow worker • PageRank has skewed partitions • One slow worker (50% CPU) • Checkpoints • Naïve - start after all workers finished • Optimized – start after first worker finished Piccolo – Paper Discussion

Checkpoint limits / scalability • Hypothetical data center • Typical machine uptime of 1 year • Worst-case scenario • Optimistic? • Looked different on some older slides Piccolo – Paper Discussion

Distributed Crawler • 32 Machines saturate 100Mbps • There are single servers doing this • Piccolo would scale higher Piccolo – Paper Discussion

Summary • Piccolo provides an easy to use distributed shared memory model • It applies many restrictions • Simple interface • Reduced synchronization • Relaxed consistency • Accumulation • Locality • But it performs well • Iterative computations • Saves going to disk compared to MapReduce • A specialized tool for data intensive in-memory computing Piccolo – Paper Discussion

Piccolo – Paper Discussion Big Data Reading Group