1 / 23

Advanced Topics

Advanced Topics. NP-complete reports. Continue on NP, parallelism. Reprise: Non-determinism. Informal: add to any algorithm taking a guess at one or more places forking and pursuing one or more possibilities

ugo
Télécharger la présentation

Advanced Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Topics NP-complete reports. Continue on NP, parallelism

  2. Reprise: Non-determinism • Informal: add to any algorithm • taking a guess at one or more places • forking and pursuing one or more possibilities • If there is a Non-deterministic algorithm, then there is a regular/standard algorithm • just try all the possibilities • may take a long time

  3. Reprise: the class P • … is all problems for which there exist an algorithm with complexity bounded by a polynomial.

  4. Reprise: the class NP • all problems for which there is an algorithm, possibly non-deterministic, that assuming you take the right paths, is bounded by a polynomial • Alternative definition: you can check that the answer is correct in polynomial time.

  5. Reprise: does P = NP? • Is it possible to find actual standard algorithms for these NP problems? • THE great problem of computer science. • Proving it false would also be significant. • Theoretical problem with considerable practical value.

  6. NP complete • A set of NP problems that can be translated into each other in polynomial time so… • If one of the problems can be solved in polynomial time • aka tractible • …. they all can.

  7. NP-hard • A problem is NP-hard if there is an NP-complete problem that can be translated into it in polynomial time. • but not necessarily the other way. • NP-hard problems are at least as hard as NP-complete problems.

  8. NP-hard example • Robot path planning in a dynamic environment

  9. Reports on NP-complete problems • Tetris • Knapsack problem • Steiner Tree problem • Graph coloring • Minesweeper • Subset problem

  10. Note • There are methods for getting answers to NP problems, but they aren't guaranteed to be optimal. • Called heuristics or approximations

  11. Distributed computing • Approach to NP problems: fork a new process • That is, use distributed computing to investigate the different choices • Some problems may be embarrassingly parallelizable.

  12. Sources • Many • Google: http://code.google.com/edu/parallel/mapreduce-tutorial.html • Note: there is controversy re: MapReduce • may be issue of patent • Is it the right framework • ??

  13. Concepts • key/value pair • Master / Worker • nodes on network • may be one Master and many Workers • hashing: quick way to find data (key/value data) • piece / partition / split / shard

  14. Example from Google tutorial • Compute pi using many workers, each doing a calculation using pseudo-random function. • no data (NOT typical MapReduce problem) • Worker picks a random pointin the square. If it is in the circle,worker increments a counter. • http://faculty.purchase.edu/jeanine.meyer/processing/piEstimate/applet/

  15. Formulas • Area_of_circle = pi * r2 • Area_of_square containing circle = 4 * r2 • So r2 = Area_of_square / 4 • Let Ac be Area_of_circle and As be Area_of_square • Then pi = 4 * Ac / As • Estimate for pi is 4 * counter / Number_of_points_tried

  16. Informal proof • The chances of any point being in the circle is proportional to the ratio of the areas. • Choosing many points randomly carries out this test. • We could [simply] use for-loops and do the calculation for every point.

  17. MapReduce • Model for distributed (aka parallel) computing • There are different products that implement MapReduce. From a google search: • Google • Apache Hadoop: Open source • Teradata • Amazon • Greenplum • Platform

  18. MapReduce • Programmers sets up program for Master and for Workers. Typically, the Master program sets up and partitions input array(s). • Typically, data is key/value pairs. • Programmers write • Map functions that process data, possibly making use of functions in the MapReduce library • Reduce functions that combine the results • Workers work on Map tasks and/or Reduce tasks. The Map task is applied to the worker's piece (aka shard) of the input array.

  19. MapReduce for pi estimate • Not typical in that there is no data • The map function does the calculation • When all done, the reduce function adds up all the individual counters and calculates the estimate for pi

  20. Speed up for pi estimate • Suppose • each step (getting the 2 random values and determining if in circle) takes K steps • suppose 1000 workers calculating all together 1000000 values • suppose adding 2 numbers takes 1 time unit • Time without distributed computing: 1000000*K • Time with distributed computing 1000*K + 1000 • Speed up is slightly less than 1000

  21. Follow-up • Look up examples using MapReduce • Note: one example is Google maintaining its keyword index by scanning (crawling) the web

  22. Speaker Twitter: @kmwinterfield • IBM Smarter Cities • Social media for political campaigns • World Community Grid

  23. Homework • Prepare question for Kevin • follow on twitter and send message OR • post on moodle • Continue with postings • Research unique NP complete problem and post summary and source!

More Related