MapReduce
MapReduce. Outline. Purpose Example Method Advanced. purpose. Purpose. Data mining Data processing. example. Example. Find the maximum temperature of year National Climatic Data Center(NCDC) The data is stored using a line-oriented ASCII format , in which each line is a record
MapReduce
E N D
Presentation Transcript
MapReduce 資工碩一 黃威凱
Outline • Purpose • Example • Method • Advanced 資工碩一 黃威凱
purpose 資工碩一 黃威凱
Purpose • Data mining • Data processing 資工碩一 黃威凱
example 資工碩一 黃威凱
Example • Find the maximum temperature of year • National Climatic Data Center(NCDC) • The data is stored using a line-oriented ASCII format , in which each line is a record • There is a directory for each year from 1901 to 2001 ,each containing a gzipped file for each weather station with its readings for that year 資工碩一 黃威凱
Example(Data format) 資工碩一 黃威凱
Example(Gzipped file, example for 1990) • % ls raw/1990 | head • 010010-99999-1990.gz • 010014-99999-1990.gz • 010015-99999-1990.gz • 010016-99999-1990.gz • 010017-99999-1990.gz • 010030-99999-1990.gz • 010040-99999-1990.gz • 010080-99999-1990.gz • 010100-99999-1990.gz • 010150-99999-1990.gz 資工碩一 黃威凱
Method 資工碩一 黃威凱
Method • Analzing the data with Unix tools • Analzing the data with Hadoop 資工碩一 黃威凱
Method(Unix tools) 資工碩一 黃威凱
Method(Unix tools) • Here is the beginning of a run: • % ./max_temperature.sh • 1901 317 • 1902 244 • 1903 289 • 1904 256 • 1905 283 • ... • The complete run for the century took 42 minutes in one run single EC2 High-CPU Extra Large Instance. 資工碩一 黃威凱
Method(Hadoop) • Use MapReduce • Map • Shuffle • Reduce 資工碩一 黃威凱
Method(Hadoop) • Map function • Pull out the year and the air temperature • Transform key-value pairs 資工碩一 黃威凱
Method(Hadoop) • Map function • The shuffle • Each reduce task is fed by many map tasks. 資工碩一 黃威凱
Method(Hadoop) • Reduce function • Iterate through the list and pick up the maximum reading • Input • (1949, [111, 78]) • (1950, [0, 22, -11]) • Output: • (1949, 111) • (1950, 22) 資工碩一 黃威凱
Method(Hadoop) • Data flow 資工碩一 黃威凱
Method(Hadoop) • Java MapReduce-Mapper example 資工碩一 黃威凱
Method(Hadoop) • Java MapReduce-Reduce example 資工碩一 黃威凱
Method(Hadoop) • Java MapReduce-Job example Support multiple path 資工碩一 黃威凱
Advanced 資工碩一 黃威凱
Advanced • Case1 資工碩一 黃威凱
Advanced • Case2 資工碩一 黃威凱
Advanced • Case3 資工碩一 黃威凱
Advanced • Combiner Functions on Map output • Example • Map input1: (1950, 0), (1950, 20), (1950, 10) • Map input2: (1950, 25), (1950, 15) • After shuffle: • Map1: (1950, [0,20,10]) • Map2: (1950, [25,15]) • No UseCombiner to reduce input • (1950, [0, 20, 10, 25, 15]) • Use Combiner to reduce input • (1950, [20, 25]) 資工碩一 黃威凱