MapReduce: Simplified Data Processing on Large Clusters

MapReduce: Simplified Data Processing on Large Clusters Authors: Jeffrey Dean and Sanjay Ghemawat Presenter: Guangdong Liu Jan 28th, 2011

Presentation Outline • Motivation • Goal • Programming Model • Implementation • Refinement

Motivation • Large-scale data processing • Many data-intensive applications involve processing huge amounts of data and then producing lots of other data • Certain common themes are shared when executing such applications • Hundreds or thousands of machines are used • Two categories of basic operation on the input data: 1) Map():process a key/value pair to generate a set of intermediate key/value pairs 2) Reduce(): merge all intermediate values with the same key

Goal • MapReduce: an abstraction that allows users to perform simple computations across large data set which is distributed on large clusters of commodity PCs while hiding the details of parallelization, data distribution, load balancing and fault toleration • User-defined functions • Automatic parallelization and distribution • Fault tolerance • I/O scheduling • Status monitoring

Programming Model • Inspired by Lisp primitives map and reduce • Map(key, val) • Written by a user • Process a key/value pair to generate intermediate key/value pairs • The MapReduce library groups all intermediate values associated with the same key together and passes them to the reduce function • Reduce(key,vals) • Also written by a user • Merge all intermediate values associated with the same key

Programming Model

Programming Model • Count words in docs • Input consists of (doc_url, doc_contents) pairs • Map(key=doc_url, val=doc_contents), for each word w in contents, emit(w, “1”) • Reduce(key=word, values=counts_list), sum all “1”s in value list and emit result “(word, sum)”

Programming Model (Hello, 1) (Bye, 1) Hello World, Bye World! (World, 1) (World, 1) (Hello, 2) (Bye, 1) (Welcome, 1) (to, 3) M3 M2 M1 R1 (Welcome, 1) (to, 1) (to, 1) Welcome to UNL, Goodbye to UNL. (Goodbye, 1) (UNL, 1) (UNL, 1) (World, 2) (UNL, 2) (Goodbye, 2) (MapReduce, 2) R2 (Hello, 1) (to, 1) Hello MapReduce, Goodbye to MapReduce. (Goodbye, 1) (MapReduce, 1) (MapReduce, 1) Map Phase Intermediate Result Reduce Phase DFS DFS

Implementation • User to do list • Indicate input and output files • M: number of map tasks • R: number of reduce tasks • W: number of machines • Write map and reduce functions • Submit jobs • This requires no knowledge of parallel/distributed systems!!!

Implementation Master Assign MapTask Assign ReduceTask Write to DFS Remote Read Local Write Read from DFS P1 P1 P1 Output 1 ... … … … … … Pr Pr Pr R1 M1 B1 … … … … B2 Input M2 … … … … Output r Rr Bn Mn Mapper Reducer Map Phase Intermediate Result Reduce Phase DFS DFS

Implementation • Input files split (M splits) • Each block is typically 16~64MB • Start up many copies of user program on a cluster of machines • Master & Workers • One special instance becomes the master • Workers are assigned tasks by the master • There are M map tasks and R reduce tasks to assign • Master finds idle workers and assigns map or reduce tasks to them

Implementation 3. Map tasks • Map workers read contents of corresponding input partition • Perform user-defined map computation to create intermediate <key,value> pairs • The intermediate <key,value> pairs produced by the map function are buffered in memory • Writing intermediate data to disk (R regions) • Buffered output pairs written to local disk periodically • Partitioned into R regions by a partitioning function • Location of these buffered pairs on the local disk are passed back to the master

Implementation • Read & Sorting • Use remote procedure calls to read the buffered data from the local disks of map workers • Sort intermediate data by the intermediate keys • Reduce tasks • Reduce worker iterates over ordered intermediate data • Each unique key encountered – key & values are passed to user's reduce function • Output of user's reduce function is written to output file on a global file system • When all tasks have completed, the master wakes up user program

Implementation • Fault tolerance-in a word, redo • Workers are periodically pinged by master • No response = failed worker • Reschedule failed tasks • Note: completed map task by the failed worker need to be re-executed because the output is stored on the local disk

Implementation • Locality • Input data is managed by GFS and has several replicas • Schedule a task on a machine containing a local replica or near a replica • Task Granularity • M map tasks and R reduce tasks • Make M and R much larger than number of worker machines

Implementation • Backup tasks • Straggler: a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. • Cause: bad disk, competition for CPU … • Resolution: schedule backup executions of in-progress tasks when a MapReduce operation is close to completion

Source • The example is quoted from: • Wei Wei; Juan Du; Ting Yu; Xiaohui Gu; , "SecureMR: A Service Integrity Assurance Framework for MapReduce," Computer Security Applications Conference, 2009. ACSAC '09. Annual , vol., no., pp.73-82, 7-11 Dec. 2009

Making Cluster Application Energy-Aware Authors: Nedeljko Vaasic, Martin Braistits and Vincent Salzgerber Jan 28th, 2011

Outline • Introduction • Case Study • Approach

Introduction • Power consumption • A critical issue in large scale clusters • Data centers consume as much energy as a city • 7.4 billion dollars per year • Current techniques for efficiency • Consolidate workload into fewer machines • Minimize the energy consumption while keeping the same overall performance level • Problems • Cannot operate at multiple power levels • Cannot deal with energy consumption limits

Case Study • Google’s Server Utilization and Energy Consumption

Case Study • Hadoop Distributed File System (HDFS)

Case Study • MapReduce

Case Study • Conclusion • It is a wise decision to aggregate load on a fewer number of machines for saving energy • Distributed applications must actively participate in the power management in order to avoid poor performance

Approach

On the Energy (In)efficiency of Hadoop Clusters Authors: Jacob Leverich, Christ Kozyrakis Jan 28th, 2011

Introduction • Improvement of energy efficiency of a cluster • Place some nodes into low-power standby modes • Avoid energy waste on oversized components for each node • Problems

Approach • Hadoop data layout overview • Distribute replicas across different nodes in order to improve performance and reliability • The user specifies a block replication factor n to ensure n identical copies of any data-block are stored across a cluster (typically n=3) • The largest number of nodes that can be disabled without impacting data availability is n-1

Approach • Covering subset • At least one replica of a data-block must be stored in a subset of nodes called covering subset • Make sure that a large number of nodes can be gracefully removed from a cluster without affecting the availability of data or interrupting the normal operation of a cluster

MapReduce: Simplified Data Processing on Large Clusters