280 likes | 364 Vues
Learn practical examples and key concepts of MapReduce for distributed computing, exploring its functional programming concepts and implementations, and optimizing data processing efficiency. Dive into Google's MapReduce framework, its divisions, operations, and applications in solving real-world problems.
 
                
                E N D
Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Last Class • How do I process lots of data? • Distribute the work • Can I distribute the work? • Maybe… if it’s not dependent on other tasks • Example: Fibonnaci.
Last Class • What problems can occur? • Large tasks • Unpredictable bugs • Machine failure • How do solve / avoid these? • Break up into small chunks? • Restart tasks? • Use known working solutions
MapReduce • Concept from functional programming • Implemented by Google • Applied to large number of problems
Functional Programming Review Java:int fooA(String[] list) { return bar1(list) + bar2(list); } int fooB(String[] list) { return bar2(list) + bar1(list); } Do they give the same result?
Functional Programming Review Functional Programming:fun fooA(l: int list) = bar1(l) + bar2(l) fun fooB(l: int list) = bar2(l) + bar1(l) Do they give the same result?
Functional Programming Review • Operations do not modify data structures: They always create new ones • Original data still exists in unmodified form
Functional Updates Do Not Modify Structures fun foo(x, lst) = let lst' = reverse lst in reverse ( x :: lst' ) foo: a’ -> a’ list -> a’ list The foo() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item. But it never modifies lst!
Functions Can Be Used As Arguments fun DoDouble(f, x) = f (f x) It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function? x: a’ f: a’ -> a’ DoDouble: (a’ -> a’) -> a’ -> a’
map (Functional Programming) Creates a new list by applying f to each element of the input list; returns output in order. map f lst: (’a->’b) -> (’a list) -> (’b list)
map Implementation fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs) • This implementation moves left-to-right across the list, mapping elements one at a time • … But does it need to?
Implicit Parallelism In map • In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements • If order of application of f to elements in list is commutative, we can reorder or parallelize execution • This is the “secret” that MapReduce exploits
Fold Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
fold left vs. fold right • Order of list elements can be significant • Fold left moves left-to-right across the list • Fold right moves from right-to-left SML Implementation: fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))
Example fun foo(l: int list) = sum(l) + mul(l) + length(l) How can we implement this?
Example (Solved) fun foo(l: int list) = sum(l) + mul(l) + length(l) fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lst fun mul(lst) = foldl (fn (x,a)=>x*a) 1 lst fun length(lst) = foldl (fn (x,a)=>1+a) 0 lst
Google MapReduce • Input Handling • Map function • Partition Function • Compare Function • Reduce Function • Output Writer
Input Handling • Divides up data into bite-size chunks • Starts up tasks • Assigns tasks to idle workers
Map • Input: Key, Value pair • Output: Key, Value pairs • Example: Annual Rainfall Per City
Map (Example) • Example: Annual Rainfall Per City map(String key, String value): // key: date // value: weather info foreach (City c in value) EmitIntermediate(c, c.temperature)
Partition Function • Allocates map output to particular reduces • Input: key, number of reduces • Output: Index of desired reduce • Typical: hash(key) % numberOfReduces
Comparison • Sorts input for each reduce • Example: Annual rainfall per city • Sorts rainfall data for each city • Seattle: {0, 0, 0, 1, 4, 7, 10, …}
Reduce • Input: Key, Sorted list of values • Output: Single value • Example: Annual rainfall per city
Reduce • Input: Key, Sorted list of values • Output: Single value • Example: Annual rainfall per city
Reduce (Example) • Example: Annual rainfall per city • reduce(String key, Iterator values): // key: city // values: temperature sum = 0, count = 0 for each (v in values) sum += v count = count + 1 Emit(sum / count)
Output • Writes the output to storage (GFS, etc)
MapReduce for Google Local • Intersections • Rendering Tiles • Finding nearest gas stations