1 / 7

MapReduce : Simplified Data Processing on Large Clusters

MapReduce : Simplified Data Processing on Large Clusters. Appendix A: Word Frequency Alex Newton Billy Coss. Contents. Abstract Introduction MapReduce Word Frequency Analysis Sample Code. Abstract. MapReduce is a model used to analyze large amounts of data

duena
Télécharger la présentation

MapReduce : Simplified Data Processing on Large Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce: Simplified Data Processing on Large Clusters Appendix A: Word Frequency Alex Newton Billy Coss

  2. Contents • Abstract • Introduction • MapReduce • Word Frequency Analysis Sample Code

  3. Abstract • MapReduce is a model used to analyze large amounts of data • Map creates key:value pairs, irrespective of duplicates • Reduce takes the key-value pairs created by the Map function and condenses them down to remove duplicate results

  4. Introduction • Data analysts at Google frequently work on extremely large sets of raw data • Parallel computing is required to process datasets in a useful length of time • MapReduce was created as a form of abstraction for the details of parallelization, fault tolerance, data distribution, and load balancing

  5. MapReduce Image taken from OSDI ‘04 Presentation by Jeff Dean and Sanjay Ghemawat.

  6. Word Frequency Analysis Example Code • Code is divided into three functions • main • WordCounter • Adder • WordCounter is used for the Map function • Skips any leading whitespace and then parses words out of text • The word itself is the key, the value is 1 • Adder is used for the Reduce function • Iterates through keys, and adds the values of the same key together • Since the value is 1, this has the effect of incrementing a counter for the number of times a word is used

  7. Sources J. Dean & S. Ghemawat (2004), MapReduce: Simplified Data Processing on Large Clusters. OSDI ‘04: 6th Symposium on Operating Systems Design and Implementation. pp. 137, 149. http://research.google.com/archive/mapreduce.html

More Related