1 / 23

Problem-solving on large-scale clusters:   theory and applications

Problem-solving on large-scale clusters:   theory and applications. Lecture 1: Introduction and Theoretical Background. Today’s Outline. Introductions Quiz Course Objective & Administrative Info fold and map : Theory. Introductions. Name + trivia. Quiz Time!.

fancy
Télécharger la présentation

Problem-solving on large-scale clusters:   theory and applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problem-solving on large-scale clusters:  theory and applications Lecture 1: Introduction and Theoretical Background

  2. Today’s Outline • Introductions • Quiz • Course Objective & Administrative Info • fold and map: Theory

  3. Introductions • Name + trivia

  4. Quiz Time! • Not graded; helps us calibrate how difficult to make this seminar • Okay (and encouraged!) to leave questions blank

  5. Course Outline • Introduction to parallel programming and distributed system design • successfully decompose problems into map and reduce stages • decide whether a problem can be solved with a parallel algorithm, and evaluate its strengths and weaknesses • understand the basic tradeoffs and major issues in distributed system design • know the common pitfalls of distributed system design • This seminar is light on “facts” and “recipes”, heavy on “tradeoffs”

  6. Course Information (1 of 2) • Lecturers: • Albert J. Wong • Hannah Tang • Lab consultant: • Alden King • Liasons: • John Zahorjan • Christophe Bisciglia

  7. Course Information (2 of 2) • Textbook • None; see online course readings • Webpage: http://www.cs.washington.edu/cse490h • Mailing lists: • Course discussion: cse490h@...

  8. Warning: Theory Ahead! • Before we can talk about MapReduce, we need to talk about the concepts on which it is founded: • Programming languages: fold and map • Distributed systems: data dependancies

  9. Digression: Function Objects (1 of 3) • A function object is a function that can be manipulated as an object • Sometimes referred to as a “functor” • In Java, this is usually implemented with a class that has an execute() (or similarly named) method

  10. The underlying idea is to pass the “greater than” operation to sort() Digression: Function Objects (2 of 3) • Example: Inheriting from the Comparable interface to use Collections.sort() class ReverseAlphaOrder implements Comparable { public int Compare(Object o1, Object o2) { if(o1 instanceof String && o2 instanceof String) { return String(o1) >= String(o2); } } String[] myStrings; ReverseAlphaOrder rao; Collections.sort(myStrings, rao);

  11. Digression: Function Objects (3 of 3) • In Java, methods that take function objects are “higher-order functions” • Collections.sort() is a higher-order function • Mathematically, a “higher order function” is a function which does at least one of the following: • Take one or more functions as input • Output a function • Examples: • The derivative (from calculus)d/dx (x3 + 2x) = 3x2 + 2

  12. fold - Introduction • fold is a family of higher-order functions that process a data structure and return a single value • Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l • The return value may be “complex”, e.g. a list • Example: • fold (+) [1,2,4,8] -> ??? • fold (/) [64,8,4,2] -> ???

  13. ÷ ÷ 2 ÷ 4 64 8 fold - Directionality • Remember how we said fold was “a family of functions”? • foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16 • foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1 • “fold right” • recursively applies f over the right side of the list • “fold left” • recursively applies f over the left side of the list Right fold Left fold ÷ ÷ 64 ÷ 8 4 2

  14. fold - Questions • Discussion questions: • What should the base case return? • foldr (+) [] -> ??? • foldr (/) [] -> ??? • Can a right fold be implemented as a loop (using tail recursion)? What about left fold? • Enrichment questions: • What happens to a right fold when given an infinite list? What about left fold?

  15. fold - Formal Definition • fold takes a function and a list as its inputs – but it can also take more values. • In particular, fold maintains context / state across each invocation of f -- If the list is empty, return the initial value ‘z’foldr f z [] = z -- If the list is not empty, calculate the result of folding the -- rest, and apply f to the first element and to that result. -- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldr foldr f z (x:xs) = f x (foldr f z xs) What is the formal definition of foldl?

  16. fold – An Intuition • fold “iterates” over a data structure, and maintains one unit of state • At each iteration, f is invoked with the current element and the current state • fold’s return value is the result of f’s final invocation

  17. map - Introduction • map is a higher-order function that “transforms” each element in a sequence of elements • Commonly, map takes a function f and a sequence s, and applies f to each element of s • Example: • map square_root [1,4,9,16] -> ???

  18. map’s Return Value • map returns a sequence • The new sequence s’ is not necessarily the same size as s • The elements of s’ do not necessarily have the same type as the elements of s

  19. a+b b a ] = map components [ , , = [ , , ] ??? , , , map’s Return Value – Example • Recall that the sum of N vectors was equal to the sum of their components: • Let components() decompose a vector into its X and Y components = [ ( , ), ( , ), ( , ) ] ???

  20. map - Questions • Enrichment questions: • For what values of f and z will fold f z l = l? How can you modify f such that fold f z l = map f l? • Bonus question: can you implement map in terms of fold? • Visit foldl.com and foldr.com :)

  21. map – Formal definition • map takes a function and a data structure as its inputs -- If the list is empty, there’s nothing to do map f [] = [] -- If the list is not empty, apply f to the first element and -- add the result to the mapping of f on all other elements map f (x:xs) = f x : map f xs What is the complexity of map? What is its runtime?

  22. Exercise (1 of 2) • Individually: • Determine how these operations can be solved with a fold, a map, or some combination of fold and map: • Given a list of vectors, add them to determine the resultant vector. • Ray tracing a single ray • Ray tracing takes a list of rays that intersect the camera, and traces their path back to their respective lightsources, even across their reflection over several surfaces • Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person. • Run-length encoding. • Run-length encoding takes a possibly-repetitive string and rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”. • Find the smallest element in an array • Come up with some challenging problems yourself!

  23. Exercise (2 of 2) • In small groups, compare your answers to the above, and stump your team with the problems you came up with!

More Related