1 / 9

MapReduce Programming Model

MapReduce Programming Model. Based on Lin and Dryer’s text: Chapter 3. Job Tracker and Task Tracker. Figure 2.6. Tom White’s Wordcount. MapReduce Model. A programmer has no control over: Where a mapper or reducer runs (i.e., on which node in the cluster).

fynn
Télécharger la présentation

MapReduce Programming Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce Programming Model Based on Lin and Dryer’s text: Chapter 3

  2. Job Tracker and Task Tracker • Figure 2.6

  3. Tom White’s Wordcount

  4. MapReduce Model • A programmer has no control over: • Where a mapper or reducer runs (i.e., on which node in the cluster). • When a mapper or reducer begins or finishes. • Which input key-value pairs are processed by a specific mapper. • Which intermediate key-value pairs are processed by a specific reducer.

  5. Techniques for controlling execution and managing data flow • Ability to: • Construct complex data types as keys and values for storage, processing and communications • Specify and execute initialization code before a map and/or reduce and the same for termination code after map and/or reduce. • To preserve state across multiple keys in map and/or in the reduce • To control sorting order of intermediate keys • To control partitioning of key space, and thus the set of keys a particular reduce will process

  6. Objective • Address the issues without creating bottleneck for scalability • Golden standard that MR attempts is sheer linear scalability • Storing and manipulating state has the potential of hindering scalability • How to improve performance? • Make the functions efficient? • Transfer of intermediate data efficient • Aggregation of intermediate data is an important operation for efficiency • Shrink the intermediate key space • What else can we do?

  7. Mapper • http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapreduce/Mapper.html • http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/mapred/package-summary.html • http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api

  8. Mapper with built-in combiner-v1 class Mapper method Map(docida, doc d) H ← new AssociativeArray for all term t ∈ doc d do H{t} ← H{t} + 1 //Tally counts for entire document for all term t ∈ H do Emit(term t, count H{t})

  9. Mapper with built-in combiner-v2 class Mapper method Initialize H ← new AssociativeArray method Map(docida, doc d) for all term t ∈ doc d do H{t} ← H{t} + 1 Tally counts across documents method Close for all term t ∈ H do Emit(term t, count H{t})

More Related