MapReduce Simplified Data Processing on Large Clusters
MapReduce Simplified Data Processing on Large Clusters. Outline. Motivation MapReduce execution overview Lifecycle of MapReduce operation Optimizations techinques Pros and Cons Conclusion. Motivation. Provide a programming model For processing large data set
MapReduce Simplified Data Processing on Large Clusters
E N D
Presentation Transcript
MapReduce Simplified Data Processing on Large Clusters
Outline • Motivation • MapReduce execution overview • Lifecycle of MapReduce operation • Optimizations techinques • Pros and Cons • Conclusion
Motivation • Provide a programming model • For processing large data set • Exploits large sets of commodity computers and High-speed Ethernet interconnect • Executes process in distributed manner • Error handling and reliability • Above all: • Simple and maybe suitable for impossible tasks !!!
Partitioning Function MapReduce Operation • Map: • Accepts input key/value pair • Emits intermediate key/value pair • Write to local disk R E D U C E M A P Very big data Result
Partitioning Function MapReduce Operation • Partitioning Function: • Partition intermediate data • Emits intermediate key/value pair • Default partition function: hash(key) mod R R E D U C E M A P Very big data Result
Partitioning Function MapReduce Operation • Reduce: • Derive intermediate key/value pair through RPC • Sort and group value with same key • Emits output key/value pair • No reduce can begin until map is complete R E D U C E M A P Very big data Result
Example: map phase inputs tasks (M=4) partitions (intermediate files) (R=2) When in the course of human events it … (when,1), (course,1) (human,1) (events,1) (best,1) … map (in,1) (the,1) (of,1) (it,1) (it,1) (was,1) (the,1) (of,1) … Over the past five years, the authors and many… It was the best of times and the worst of times… map (over,1), (past,1) (five,1) (years,1) (authors,1) (many,1) … (the,1), (the,1) (and,1) … This paper evaluates the suitability of the … (this,1) (paper,1) (evaluates,1) (suitability,1) … map (the,1) (of,1) (the,1) … Note: partition function places small words in one partition and large words in another.
Example: reduce phase partition (intermediate files) (R=2) reduce task sort (in,1) (the,1) (of,1) (it,1) (it,1) (was,1) (the,1) (of,1) … run-time function (the,1), (the,1) (and,1) … (and, (1)) (in,(1)) (it, (1,1)) (the, (1,1,1,1,1,1)) (of, (1,1,1)) (was,(1)) (the,1) (of,1) (the,1) … reduce user’s function (and,1) (in,1) (it, 2) (of, 3) (the,6) (was,1) Note: only one of the two reduce tasks shown
Partitioning Function MapReduce Operation • Setup Phase • What should user do? • z R E D U C E M A P Very big data Result
One-time Setup • User to do list: • indicate: • Input/output files • M: number of map tasks • R: number of reduce tasks • W: number of machines • Write user-defined mapand reduce functions • Submit the job • This requires no knowledge of parallel and distributed systems
Master • Propagates intermediate file (location + size) • from map tasks to reduce tasks • For each map task and reduce task, master stores the possible states. (3 states) • O(MR) states in memory • Termination condition • All tasks are in the “completed” state.
Failures • Failure detection mechanism • Master pings workers periodically. • Map worker failure • Map tasks completed or in-progress at worker are reset to idle • Reduce workers are notified when task is rescheduled on another worker • Reduce worker failure • Only in-progress tasks are reset to idle • Master failure • MapReduce task is aborted and client is notified • Master checkpoint its data structures
Fault Tolerance • Input file blocks stored on DFS • On errors, workers send “last gasp” UDP packet to master • Master notices particular input key/values cause crashes in map(), and skips those values on re-execution. • Work around bugs in third-party libraries
Load Balancing • Number of mapper and reducer are much larger than the number of worker machines. • When a worker fails, the many tasks assigned to it can be spread out across all the other workers. • Backup Tasks • A slow running task (straggler) prolong overall execution
Backup Task • Stragglers often caused by circumstances local to the worker on which the straggler task is running • Overload on worker machined due to scheduler • Frequent recoverable disk errors • Significantly reduces the time to complete large MapReduce operations • Schedule backup(replacement) tasks to idle worker • First to complete “wins” • Can significantly improve overall completion time
Implementations • Google • Not available outside Google • Hadoop • An open-source implementation in Java • Uses HDFS for stable storage • Aster Data • Cluster-optimized SQL Database that also implements MapReduce
MapReduce Benefits • Ease of use, “out of box” experience • Not available outside Google • Portable regardless of computer architecture • Better in homogeneous network • Scalability • 4000 node in practice • Fault tolerance • Only failed tasks are re-executed. • Load balancing • Master I/O scheduling .
Negative Points • Not perform well enough in structured data set. • DBMS is better in this case • Only linearly separable input data • The “central” Master failure • Locality issue can be better addressed • Why not from computation to data?