1 / 157

Sublinear Algorithms

Sublinear Algorithms. Artur Czumaj DIMAP and Department of Computer Science University of Warwick. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A. Sublinear Algorithms. What can we do in sublinear time?. Common knowledge

karan
Télécharger la présentation

Sublinear Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sublinear Algorithms Artur Czumaj DIMAPandDepartment of Computer Science University of Warwick TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAA

  2. Sublinear Algorithms What can we do in sublinear time? • Common knowledge Nothing … … how can we say anything about the input if we don’t have time to read it … • On the other hand: • statistical studies tell us - sometimes we can get some approximate claims Negative results: If I have 1012 numbers and I want to verify if one of them is even, I need to check each number (in the worst case) Positive experience: Election forecast: fairly well predictions of the election results without counting all the votes

  3. Why do we want sublinear time? • Increasing role of computer/digital technologies in all aspects of our life  overwhelmed with information to be processed Massive data a decade ago BIG DATA now (structures with billions of nodes)

  4. Why do we want sublinear time? • Modern data sets are frequently prohibitively huge • Examples of modern big data sets • Packet transactions in routers • Credit card transactions • Internet traffic logs, clickstreams, • Web data • … • Even linear-time algorithms are too slow • If data is of size 1015 – how to process it?

  5. Why do we want sublinear time? • When dealing with such big data: of critical importance to not only being able just to analyze it, but to analyze it very efficiently • In many emerging application we have to cope with inputs of enormous sizes  managing and analyzing such data sets forces us to re-examine the traditional notions of efficiency • What is often needed: sublinear algorithms: algorithms that use resources (time and/or storage) significantly less than the input size

  6. How can we achieve sublinear time? • We can‘t read the whole input • We can get approximate solutions only (in most of non-trivial cases) • We need randomized algorithms (in almost all non-trivial cases)

  7. How can we achieve sublinear time? • We can do random sampling to achieve some partial information about the input • Approximate (typically) • Refinement: we can do adaptive sampling or even define some stochastic process to select part of the input

  8. How can we achieve sublinear time? Classical (early/easy) results: • approximate counting elements from {1,…,W} • how many votes went to Obama / to Romney • approximate the median • approximate the average of elements from {1,...,W} All these are easy problems: can we deal with more complex ones?

  9. Plan of the talk • A few examples of non-trivial sublinear-time algorithms

  10. Searching • Input: key and numbers • Is key among the numbers? • Key factor: input representation • Numbers are in an unsorted array/list • Numbers are in a sorted array • Numbers are in a sorted list

  11. Searching • Input: key and numbers • Is key among the numbers? • Key factor: input representation • Numbers are in an unsorted array/list • … • … Q(n)time necessary

  12. Searching • Input: key and numbers • Is key among the numbers? • Key factor: input representation • … • Numbers are in a sorted array • … Q(log n)time necessary

  13. Searching • Input: key and numbers • Is key among the numbers? • Key factor: input representation • … • … • Numbers are in a sorted list More tricky

  14. Searching in a sorted list • If we don’t have access to intermediate elements in the list • What if we have “random” access to intermediate elements? Hopeless: time necessary

  15. Searching in a sorted list • Access to intermediate elements • All elements are distinct 9 4 6 1 0 8 5 2 2 1 7 0 2 2 8 4 We can do better than in linear time!

  16. Searching in a sorted list Traverse the (unique) sublist that can contain key Splits the list into sublists Check which sublist can contain key • Pick random elements • Wlog, • Check if there is such that • Else, find such that • Start traversing the sorted list from until either key is found or reach Correctness is trivial What’s the runtime?

  17. Searching in a sorted list time time (since we don’t need to sort) • Pick random elements • Wlog, • Check if there is such that • Else, find such that • Start traversing the sorted list from until either key is found or reach time Expected time Expected running time is Cannot be improved

  18. Searching in a sorted list • Access to intermediate elements • What if NOT all elements are distinct ? Let key = 2 Distinguish between two inputs: 1,1,1,…,1,3,3,3,…,3 1,1,1,…,1,2,3,3,…,3 (with #1s ~ #3s) time is necessary

  19. Searching in a sorted list • Nontrivial application: Input: two convex polygons given as “chains” (sequences of consecutive points) Output:do these two polygons intersect ? Chazelle et al. used the searching algorithm to solve the problem in time

  20. If points in each polygon are in an array in an arbitrary order and each polygon is represented by a list then we can detect if the two (convex) polygons intersect in time

  21. Searching in a sorted list • Nontrivial application: Input: two convex polygons given as “chains” (sequences of consecutive points) Output:do these two polygons intersect Chazelle et al. used the searching algorithm to solve the problem in time Chazelle et al. used similar approach to get -time algorithms for a number of other geometric problems

  22. Sublinear time graph algorithms

  23. Average degree in a graph • Given a connected graph • We have access (oracle) to degree of each single vertex • What is the average degree of ? Estimate the number of edges • Can we do it in time?

  24. Related problem • Given integers from interval • Estimate their average • Can we get 7-approximation in time? NO!!! Average is 1 time necessary Average is greater than 7 • How to distinguish between the following two inputs: • All numbers are 1 • 8 numbers are and numbers are 1

  25. Related problem • Given n integers from interval • Estimate their average • Can we get 7-approximation in time? If this problem requires W(n) time then can we estimate the degree faster? Remember that input graph is connected Feigegave a approximation algorithm running in time

  26. Graphs versus numbers • The reason of lower bound for the numbers: • Large numbers can hide • But in a graph: • Vertices with large degrees cannot hide

  27. Feige’s algorithm* Repeat times sample a set of vertices i.u.r. For each sample set, compute the average degree of the sampled vertices Return the smallest average degree

  28. Notation • = average degree • = sampled set of vertices • = average degree of vertices in S

  29. Easy upper bound • Clearly, • Hence, Markov inequality yields: • Random sampling won’t overestimate • We’ll take samples • we expect the smallest to be smaller than Our next goal: show that it won’t underestimate

  30. Why we won’t underestimate Goal: prove with prob. . vertices with highest degree . Claim 1: sum of degrees of vertices in Intuition: random sample won’t take any vertex from

  31. Sum of degrees of vertices in •  there are edges between vertices in • Every other edge has ¸ 1 endpoint in Sum of the degrees of vertices in

  32. Why we won’t underestimate • Bound for average degree in bound for expected average degree in • Chernoff-Hoeffding bound gives with prob. To use Chernoff-Hoeffding we needed a good upper bound for the maximum degree value of a node in (and that’s why we treated and separately)

  33. Summarizing • Average degree in satisfies -approximation

  34. Average degree in graphs Feige: a approximation algorithm running in time

  35. Average degree in graphs • Feige proved also that • time is necessary • no -time algorithm can get -approx. Goldreich and Ron “improved” it!

  36. Neighbourhood model • Goldreich & Ron - access a neighbor of a vertex (access to individual edges, their endpoints) • approximation in time

  37. Ideas of improvement • Why did Feige get only approx.? • Random sample got only nodes from • Edge between nodes in and in contributed only to the degree • and should have contributed • Goldreich and Ron count twice each edge between a node in and in • each edge with both nodes in is seen twice; • each edge with one node in is seen once; • each edge with both nodes in is not seen.

  38. Choose nodes at random • If is the max-deg in this set, then • = nodes of degree Idea of the algorithm … but we don’t know • Suppose that we know • Randomly sample nodes to • For each vertex • Count edges from to nodes in  • Count edges from to nodes in  “set” Isn’t this too expensive? time and are estimated by random sampling

  39. Average degree in graphsNeighbouring model Goldreich and Ron (06) gave a approximation algorithm running in time

  40. Next graph problem

  41. Minimum Spanning Tree (MST) • G = (V,E)undirected connected weighted graph w: E !R • MST Problem: • Find a minimum spanning tree (MST) of G

  42. Minimum Spanning Tree (MST) • Classical Problem: • Well-studied • -timerandomized algorithm • (Karger, Klein, Tarjan’94) • Unknown if we can solve it in deterministic time • Best known - runtime (Chazelle’97)

  43. Estimating the weight of MST • If we don’t want to find MST - only its weight • we can do (sometimes) better • Chazelle, Rubinfeld, Trevisan ’01: • is represented by adjacency lists • Average degree is • All weights are known to be in interval • Randomized-approximation of weight of MST in time ) • Sublinear if and are small • even constant if and are constant • Doesn’t have to read the entire input • … but might be slow if either or is large

  44. Idea behind the algorithm • Characterize MST weight in terms of number of connected components in certain auxiliary graphs • Show that the number of connected components can be approximated quickly

  45. MST weight vs. #Connected Components • W=2 - the largest weight 1 2 2 1 2 2 2 2 1 1 2 2 2 1 1 2 2 1 1 2

  46. MST weight vs. #Connected Components • W=2 - the largest weight • There are c=4 connected components induced by weight 1 edges 1 2 2 1 2 2 2 2 1 1 2 2 2 1 1 2 2 1 1 2

  47. MST weight vs. #Connected Components • W=2 - the largest weight • There are c=4 connected components induced by weight 1 edges 1 2 2 1 2 2 2 2 1 1 2 2 2 1 1 2 2 1 1 2

  48. MST weight vs. #Connected Components • W=2 - the largest weight • There are c=4 connected components induced by weight 1 edges MST must have edges of weight 1 2 2 1 2 2 2 2 1 1 2 2 2 1 1 2 2 1 1 2

  49. MST weight vs. #Connected Components • = number of connected components induced by edges of weight at most • Then we get for arbitrary : (assuming all weights are integers between and )

  50. The Algorithm ApproxMST () for 1 to do = CountConnectedComponents() Output: How to compute/approximate the number of connected components ?

More Related