1 / 10

MapReduce

MapReduce. Powering Hadoop. Overview. Overview What is MapReduce How Does It Divide Work Example Conclusion References. What Is MapReduce. Originally created by Google Used to query large data-sets Extracts relations from unstructured data Can draw from many disparate data sources.

kobe
Télécharger la présentation

MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce Powering Hadoop

  2. Overview • Overview • What is MapReduce • How Does It Divide Work • Example • Conclusion • References

  3. What Is MapReduce • Originally created by Google • Used to query large data-sets • Extracts relations from unstructured data • Can draw from many disparate data sources

  4. How It Divides Work http://docs.basho.com/riak/1.3.0/tutorials/querying/MapReduce/

  5. 4 Refinements • General algorithms fit most needs • User defined Tweaks to the Map and Reduce functions fit special problems

  6. 4.1 Partitioning Function • Users can define the number of reduce tasks to run (R) • We can redefine the intermediate keys • A default function is hash(key) mod R • Sometimes we may want to group output together, such as grouping web data by domain • We can redefine partition to use hash(Hostname(urlkey)) mod R

  7. 4.2 Ordering Guarantees • Within each partition, intermediate key/value pairs are always processed in increasing order • This supports efficient lookup of random keys

  8. 4.3 Combiner Function • There is sometimes significant repetition in the intermediate keys • This is usually handled in the Reduce function, but sometimes we want to partially combine it in the Map function • The combiner function does this for us, and in some situations grants significant performance gains

  9. 4.4 Input and Output Types • MapReduce can take data from a number of formats • The way the data is organized for input greatly effects the output • Adding support for a new data type only requires users to change the reader interface

  10. 4.5 Side-effects • Sometimes we want to output additional files from the Map or Reduce functions • Users are responsible for these files, as long as these outputs are deterministic

More Related