1 / 133

The Discrepancy Method

The Discrepancy Method. An Introduction Using Minimum Spanning Trees. Did you know?. “Recursion comes from the verb recur. There is no verb recurse.” http://www.cse.ucsc.edu/~larrabee/ce185/reader/node191.html. Did you know?. If m/n = W (log O(1) n) , then a (m, n) = O(1) .

moses-cohen
Télécharger la présentation

The Discrepancy Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Discrepancy Method An Introduction UsingMinimum Spanning Trees

  2. Did you know? “Recursion comes from the verb recur. There is no verb recurse.” http://www.cse.ucsc.edu/~larrabee/ce185/reader/node191.html

  3. Did you know? If m/n = W(logO(1) n), then a(m, n) = O(1). Linear-Time Pointer-Machine Algorithms for Least Common Ancestors, MST Verification and Dominators by Buchsbaum, Kaplan, Rogers and Westbrook

  4. Credit The Discrepancy MethodBernard Chazelle

  5. Credit Finding MST in O(ma(m,n)) TimeSeth Pettie

  6. The Discrepancy Method MinimumFinding

  7. Minimum Finding For boys and girls: Given an unsorted array A of n unique integers, find the minimum.

  8. One Possible Algorithm Divide A into 100 parts. For each part, recursively find the minimum within that part. Find the minimum among the 100 minimum elements from each part.

  9. A Term To Introduce Those 100 elements obtained from recursion form a low discrepancy subset.

  10. "Definition" Roughly speaking, a low discrepancy subset is a subset that is representative, i.e. (we hope) the solution of the problem using this subset is “close to” the solution using the original set.

  11. One Possible Algorithm Divide A into 100 parts. For each part, recursively findthe minimum within that part. Find the minimum among the 100 minimum elements from each part.

  12. About Size How big should the low discrepancy subset be? We could have divide A into • 100 parts • log(n) parts • n/5 parts

  13. Let's Try Sampling Sample u.a.r. 1% of the elements. Recursively find their minimum x. Reject elements larger than x. What is the expected number of elements to remain? 100k/(k+1) ~= 100

  14. "Close To" ??? These 1% of the elements also form a low discrepancy subset. Their minimum is “close to” the true minimum.

  15. The Discrepancy Method MedianFinding

  16. Median Finding For adults: Given an unsorted array A of n unique integers, find the median.

  17. A Randomized Algorithm Pick a pivot element such that it splits A into a (¼, ¾) split (or even better). Recur on the correct side.

  18. A Deterministic Algorithm Divide A into groups of 5. Find the median of each group. Recursively find the median x of these medians and use x to pivot, then recur. • How far away is x from the true median?

  19. Answer: Not Far

  20. Answer: Not Far

  21. A Deterministic Algorithm Divide A into groups of 5. Find the median of each group. Recursively find the median x of these medians and use x to pivot, then recur. • How far away is x from the true median? What is the low discrepancy subset here?

  22. The Discrepancy Method Identify a low discrepancy subset. Solve the problem using it (probably with divide-and-conquer). Patch the almost-right solution to obtain the surely-right solution.

  23. About Size How big should the low discrepancy subset be? We could have divide A into • 100 parts • log(n) parts • n/5 parts

  24. Lower discrepancy More time to get a solution Less work to patch solution Trade-off Bigger subset Doesn’t seem to have an easy answer. Be creative!

  25. A Lossy Data Structure Soft Heaps

  26. Soft Heaps Items with keys from a total order • create(S) • delete(S, x) • findmin(S) • meld(S, S’) • insert(S, x) Amortized O(1) O(log 1/e)

  27. The Catch Soft heaps can corrupt the keys! Consider a mixed sequence of operations that includes n inserts.For any error rate0 <e· ½, a soft heap can contain at most en corrupted keys at a time.

  28. Key Corruption The values of certain keys can be increased at the sole discretion of the soft heap. Once corrupted, a key will remain corrupted.

  29. Even Worse When you delete a (corrupted) key, some other keys may be corrupted during the deletion. It is possible for all keys to become corrupted over the lifetime of the soft heap.

  30. The Worst Because of deletions, the proportion of corrupted items inside a soft heap could be much greater than e. (The theorem says “in a mixed sequence of operations that includes n inserts…”.)

  31. Median Once More Note its online nature. 1. Pick e be ¼. 2. Insert n integers. 3. Do n/2 findmins, each followed by a deletion. • Among the keys deleted, how far away is the largest original key from the true median?

  32. Answer: Not Far We have n/2 elements left. At most n/4 of them are corrupted. The worst case is when those n/4 elements were small in the beginning. So the largest original key we deleted is at most rank 3n/4. Now pivot!

  33. Structure of Soft Heaps A binomial tree of rank k has 2^k nodes. A soft heap is a sequence of modified binomial trees of distinct rank, called soft queues.

  34. Soft Queues • A binomial tree with possibly a few sub-tree pruned. • The rank of a node is the number of children in the originaltree. (Hence it’s an upper bound.) • Rank invariant: The root has at least brank(root)/2c children.

  35. Item Lists • A node contains an item list. • ckey is the common value of all keys in the list (upper bound of all keys). • A soft queue is heap-ordered w.r.t. ckeys. • Let r = 2dlog 1/ee + 2. We require all corrupted items stored at nodes of rank >r.

  36. Example Soft Heap ckey The result of melding two soft queues of rank 2. item list 2,4 4 3,7 7 6 6 5 5 8 8 9 9

  37. Sift sift(S) • If S has one node, done. • v = child of root with smallest key • Move key of v to root • sift(sub-tree rooted at v)

  38. Sift Sift Sift sift(S) • If S has one node, done. • v = child of root with smallest key • Move key of v to root • sift(sub-tree rooted at v) • If height of S is now odd, goto 1.

  39. Final Word About Soft Heaps Optimality

  40. Minimum Spanning Tree Overview

  41. Graph Model • G = (E,V) • n vertices • m edges,multiple edges allowed, but no self-loops • edge cost of e is c(e),WLOG assume all distinct

  42. A Brief History of MST 1926 Boruvka O(m log n) 1930 Jarnik .. 1956 Kruskal .. 1957 Prim .. 1974 Tarjan (unpublished) O(m (log n)1/2) 1975 Yao O(m log log n) 1976 Cheriton and Tarjan O(m log logd n), d=max(2, m/n) 1986 Fredman and Tarjan O(m b(m,n)) 1986 Gabow et al. O(m log b(m,n)) 1990 Fredman and Willard O(m), RAM 1994 Klein and Tarjan O(m), randomized 1997 Chazelle O(m a(m,n) log a(m,n)) 1999 Chazelle, Pettie O(m a(m,n))

  43. Two Rules CutThe cheapest edge crossing any cut is in the MST. CycleThe costliest edge on any cycle is not in the MST.

  44. Cycle Rule CycleThe costliest edge on any cycle is not in the MST. That also means if an edge is not in the MST, there must be a cycle that witnesses this fact. (What is that cycle?)

  45. The Sampling Advantage Most methods use divide-and-conquer by splitting up the graph using either • the distribution of the edge costs, or • the combinatorial structure, but rarely both. Sampling allows us to do both at once.

  46. Deterministic Sampling Find a low discrepancy subgraph whose own MST bears witness to the non-MST status of many edge. (Still remember the cycle rule?)

  47. Two Notations G*R The graph derived from G by raising the costs of all edges in R µ E. G\C The graph derived from G by contracting the subgraph C into a single vertex c.

  48. Contraction 23 23 8 4 9 4 9 2 6 6 14 12 12 C 17 17 G\C G

  49. The Past There is a high degree of freedom in choosing the contractions. (That’s why we have so many different algorithms.) But these algorithms all confront the same dilemma…

  50. Dilemma Many MST algorithms identify the cheapest edge crossing a cut by maintaining all eligible edges in a heap. But as the graph gets contracted, the degree of vertices tend to grow. So, finding the cheapest edge becomes more and more difficult.

More Related