Google MapReduce

Google MapReduce A Distributed Computing Solution

Outlines Introduction MapReduce Model Implementation Refinements Q&A

Introduction • MapReduce is designed to process large amount of distributed raw data. • Machine learning, clustering data, query, graph-computation… • The most significant use to date is indexing. • What’s the issues to be concerned?

Introduction • Abstraction • Reliability • Fault-tolerance • Load-balancing • Efficiency • Parallelization • Data distribution

Brief Ideas User defines the input data into key/value pairs. User defines Mapper function to process key/value pairs into intermediate key/value pairs. User defines Reducer function to process intermediate key/value pairs and producing results.

Brief Ideas Why it works? An interface that abstracts the large-scale computation into two main function. Automatic Parallelization Robustness

Model

Key/Value Pair Input data type. Consists of a key and a value. Both key and value are of String type.

Map function • User specified function. • Processes some splits of input key/value pairs and produces intermediate key/value pairs • The output should be sorted (by key) and flushed to disk. • Sorting makes sure pairs with the same key are grouped together. (key,value)[] map((key,value))

Intermediate Key/Value Pairs • Output of mappers. • Intermediate key/value pairs will be further shuffled(grouped) by reducer. • The process is implemented by user specified partitioning function. • Ex. Hash(key) mod R • Now the pairs become “key/value[]” pair

Reduce function Another user specified function. Integrate key and list of values into result output. Usually the result contains only one value or even none. value[] reduce((key,value[]))

Example : Word Counter map( String key, String value ){ for each word w in value{ EmitIntermediate(w,”1”); } } ……then all intermediates having the same w will be shuffled into (w,(”1”,”1”,…,”1”)) Reduce( String key, Iterator values ){ int result = 0; for each v in values{ result += parseInt(v); } Emit(toString(result)); }

Other examples • Counting URL access frequency • Map function processes URL log and output <URL, 1> • Reduce function adds together all values for the same URL and emits the count for each URL. • Reverse Web-Link Graph • Map function outputs <target, source> • Reduce function outputs <target, list(source)> • Inverted Index • Map function processes document into <word, documentID> • Reduce function sorts documentID and emits <word, list(documentID)>

Using MapReduce Model User specifies the parallelization level. User specifies map and reduce function User specifies partitioning function

Implementation • A widely used implementation at Google. • Large cluster of commodity PCs, connected by switched Ethernet. • Machine : dual-processor x86 processors running Linux • Memory : 2-4GB per machine • Networking : 100MB~1GB/second per machine • Maybe slower because of bisection bandwidth. • Storage : Inexpensive IDE disks directly attached to machines • File system : GFS (High availability and reliability)

Execution User program calls MapReduce function. The execution steps are briefly illustrated below.

Execution Flow

Execution • The library in the user program splits input data file into M splits. Then it starts the program on a cluster of machines. • M is chosen so that each splits will be 16~64MB per piece. 16 and 64 can be configured by user. • Locality optimization • One of the machines will be elected as the master, and others as workers.

Execution • Master assumes there are M map tasks and R reduce tasks to be assigned. Then it assigns the idle workers a map task or reduce task. • R also can be manually configured. But usually constrained because each reduce worker generate a separate file. • Usually M and R are chosen to be a multiple of the number of worker machines. • Better load balancing • Speeds up worker-failing recovery.

Execution The worker which is assigned the ith map task will read the ith input splits. It further parse the data and feeds the key/value pairs into map function, then buffered the output in the memory.

Execution • Periodically the memory will be flushed to local disk. Now the intermediate key/value pairs will be partitioned in R groups. • By the partitioning function. (ex. Hash(key) modR) Finishing the storing, the worker notice the master the locations where it stores the data. • The data will be forwarded to reduce worker later.

Execution • The master notify a reduce worker the location of the locations of its input are. Then the ith reduce worker will reads the ith group of intermediate key/value pairs. Then sort the pairs read. • Sorting ensures that the pairs with the same key will be put together. • Sorting is needed because there maybe multiple keys to process in a given task. • If the amount of the data exceeds memory, external sort will be used.

Execution • After sorting each unique key encountered, it collects all values follow the key and feeds the key/values pair to reduce function. The output will be appended to its final output file.

Execution • When all map tasks and reduce tasks completes, the master will wakeup the user program and the MapReducefunction returns. • There will be exactly R output files. • Usually the output files are given by user program, so no return value is needed.

Illustration REDUCE Done! 1.Initializing 2. Assign tasks 3.Read Inputs 4. Store and complete 5. Read intermediate 6. Output 7. End operation Done! MAP Reducers Mappers

Master • The scheduler of the whole MapReduce process. • It pings each worker periodically. • It keeps the state of each task (either map or reduce) in memory. • State is of three possible value { idle, in-progress, completed } • Workers completing their task will let the master know. So the master knows where to get the intermediate key/value pairs. • It keeps all intermediate pairs’ location and propagates them to reduce workers.

Fault Tolerance • What can go wrong? • preemption 78.3% exceeded resources 10.8% crashed 9.2% machine failure 1.6% • Worker fails • Offline, Straggler • Master fails • Input and output file consistency is ensured by file system (like GFS).

Worker Failure (Offline) • Master pings worker periodically to detect failing worker. • In-progress task assigned to the worker will be re-assigned. • To map worker, any completed map task (their state should be completed) done by the worker will be set back to idle state and can be assigned to other worker again. • Because of the loss of the output in the worker. • All reduce workers will be notified for the re-execution.

Worker Failure (Straggler) • The worker that take unusual long time to complete the task. • Bad disk , resource competition, ill caching mechanism… • The master assigns a backup task to the in-progress task. • Either the primary or backup will complete the task first, and the master just ignores the later one. • A backup-disabled MapReduce takes about 44% longer times to accomplish the work.

Data Consistency • Every map worker generates R temporary files, and every reduce worker generate 1 output file. • All reduce worker writes to a temporary file. The file will be (atomically) renamed to the final output. • The atomicity is guaranteed by underlying file system.

Master Failure Master periodically checkpoints its data structure. If master dies, a new copy can be started from the last checkpoints. This happens relatively unlikely. (There will always be one master) Real implementation just abort MapReduce operation and can be retry later.

Locality • GFS generally has 3 replicas for each file chunk (on different machines). • Master will try to assign a map task on a machine that contains a replica of input. • The locality will help saving network bandwidth. • If impossible, than a nearby machine can be selected. • Empirically a large fraction of input are read locally.

Partitioning Function • Can use simple hash-modular method. • Sometimes it will be better if we partition the data so that the output file will be well-organized. • Ex: Hash(Hostname(keyURL)) modR

Combiner • In some cases there will be much repetition of intermediate pairs. • An intermediate function between output of map function and intermediate key/value pairs. • Ex. Thousands of <word, 1> can be merged to <word, k> • Saving a lot of network bandwidth between reduce worker and map worker.

Skipping Bad Sectors • Sometimes there are bugs in user code so that worker fails to process some records. • If a records failed, the argument’s ID will be sent to the master. • If the ID is seen more than once, than the operation just skip that record. • This service can be manually turned off.

Counter • Sometimes user will need to accumulate some number of the occur of special events. • The words processed, the Chinese documents processed… • User can create a Counter object and increment it in map or reduce function. • The counter value is propagated to master on periodic ping. All values will later be aggregated. • Only successful task’s value will be aggregated

Q&A Any questions?

Google MapReduce

Google MapReduce

Presentation Transcript

MapReduce

Google MapReduce

Introduction to Google MapReduce

Google Base Infrastructure: GFS and MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

MapReduce

Google MapReduce Framework

MapReduce

MapReduce

Introduction to Google MapReduce

Coursework II: Google MapReduce in GridSAM

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka

Google MapReduce Framework

MapReduce

MapReduce