1 / 116

Randomization in Graph Optimization Problems

Randomization in Graph Optimization Problems. David Karger MIT http://theory.lcs.mit.edu/~karger. Randomized Algorithms. Flip coins to decide what to do next Avoid hard work of making “right” choice Often faster and simpler than deterministic algorithms

guri
Télécharger la présentation

Randomization in Graph Optimization Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomization in Graph Optimization Problems David Karger MIT http://theory.lcs.mit.edu/~karger

  2. Randomized Algorithms • Flip coins to decide what to do next • Avoid hard work of making “right” choice • Often faster and simpler than deterministic algorithms • Different from average-case analysis • Input is worst case • Algorithm adds randomness

  3. Methods • Random selection • if most candidate choices “good”, then a random choice is probably good • Monte Carlo simulation • simulations estimate event likelihoods • Random sampling • generate a small random subproblem • solve, extrapolate to whole problem • Randomized Rounding for approximation

  4. Cuts in Graphs • Focus on undirected graphs • A cut is a vertex partition • Value is number (or total weight) of crossing edges

  5. Optimization with Cuts • Cut values determine solution of many graph optimization problems: • min-cut / max-flow • multicommodity flow (sort-of) • bisection / separator • network reliability • network design Randomization helps solve these problems

  6. Presentation Assumption • For entire presentation, we consider unweighted graphs (all edges have weight/capacity one) • All results apply unchanged to arbitrarily weighted graphs • Integer weights = parallel edges • Rational weights scale to integers • Analysis unaffected • Some implementation details

  7. Basic Probability • Conditional probability • Pr[AÇB] = Pr[A] ×Pr[B | A] • Independent events multiply: • Pr[AÇB] = Pr[A] ×Pr[B] • Linearity of expectation: • E[X + Y] = E[X] + E[Y] • Union Bound • Pr[X ÈY] [ Pr[X] + Pr[Y]

  8. Random Selection forMinimum Cuts Random choices are good when problems are rare

  9. Minimum Cut • Smallest cut of graph • Cheapest way to separate into 2 parts • Various applications: • network reliability (small cuts are weakest) • subtour elimination constraints for TSP • separation oracle for network design • Nots-t min-cut

  10. Max-flow/Min-cut • s-t flow: edge-disjoint packing of s-t paths • s-t cut: a cut separating s and t • [FF]: s-t max-flow = s-t min-cut • max-flow saturates all s-t min-cuts • most efficient way to find s-t min-cuts • [GH]: min-cut is “all-pairs” s-t min-cut • find using n flow computations

  11. Flow Algorithms • Push-relabel [GT]: • push “excess” around graph till it’s gone • max-flow in O*(mn)(note: O* hides logs) • Recent O*(m3/2)[GR] • min-cut in O*(mn2) --- “harder” than flow • Pipelining [HO]: • save push/relabel data between flows • min-cut in O*(mn) --- “as easy” as flow

  12. Contraction • Find edge that doesn’t cross min-cut • Contract (merge) endpoints to 1 vertex

  13. Contraction Algorithm • Repeat n - 2 times: • find non-min-cut edge • contract it (keep parallel edges) • Each contraction decrements #vertices • At end, 2 vertices left • unique cut • corresponds to min-cut of starting graph

  14. Picking an Edge • Must contract non-min-cut edges • [NI]: O(m)time algorithm to pick edge • n contractions: O(mn)time for min-cut • slightly faster than flows If only could find edge faster…. Idea: min-cut edges are few

  15. Randomize Repeat until 2 vertices remain pick a random edge contract it (keep fingers crossed)

  16. Analysis I • Min-cut is small---few edges • Suppose graph has min-cut c • Then minimum degree at least c • Thus at least nc/2 edges • Random edge is probably safe Pr[min-cut edge] £c/(nc/2) = 2/n (easy generalization to capacitated case)

  17. Analysis II • Algorithm succeeds if never accidentally contracts min-cut edge • Contracts #vertices from n down to 2 • When k vertices, chance of error is 2/k • thus, chance of being right is 1-2/k • Pr[always right] is product of probabilities of being right each time

  18. Analysis III …not too good!

  19. Repetition • Repetition amplifies success probability • basic failure probability 1 - 2/n2 • so repeat 7n2 times

  20. How fast? • Easy to perform 1 trial in O(m) time • just use array of edges, no data structures • But need n2 trials: O(mn2) time • Simpler than flows, but slower

  21. An improvement [KS] • When k vertices, error probability 2/k • big when k small • Idea: once k small, change algorithm • algorithm needs to be safer • but can afford to be slower • Amplify by repetition! • Repeat base algorithm many times

  22. (50-50 chance of avoiding min-cut) Recursive Algorithm Algorithm RCA (G, n ) {G has n vertices} repeat twice randomly contract G to n/21/2 vertices RCA(G,n/21/2)

  23. Main Theorem • On any capacitated, undirected graph, Algorithm RCA • runs in O*(n2) time with simple structures • finds min-cut with probability ³ 1/log n • Thus, O(log n) repetitions suffice to find the minimum cut (failure probability 10-6) in O(n2 log2n) time.

  24. Proof Outline • Graph has O(n2)(capacitated) edges • So O(n2) work to contract, then two subproblems of size n/2½ • T(n) = 2 T(n/2½) + O(n2) = O(n2log n) • Algorithm fails if both iterations fail • Iteration succeeds if contractions and recursion succeed • P(n)=1 - [1 - ½ P(n/2½)]2 = W (1 / log n)

  25. Failure Modes • Monte Carlo algorithms always run fast and probably give you the right answer • Las Vegas algorithms probably run fast and always give you the right answer • To make a Monte Carlo algorithm Las Vegas, need a way to check answer • repeat till answer is right • No fast min-cut check known (flow slow!)

  26. How do we verify a minimum cut?

  27. Enumerating Cuts The probabilistic method, backwards

  28. Cut Counting • Original CA finds any given min-cut with probability at least 2/n(n-1) • Only one cut found • Disjoint events, so probabilities add • So at most n(n-1)/2 min-cuts • probabilities would sum to more than one • Tight • Cycle has exactly this many min-cuts

  29. Enumeration • RCA as stated has constant probability of finding any given min-cut • If run O(log n) times, probability of missing a min-cut drops to 1/n3 • But only n2 min-cuts • So, probability miss any at most 1/n • So, with probability 1-1/n, find all • O(n2 log3n) time

  30. Generalization • If G has min-cut c, cut £ac is a-mincut • Lemma: contraction algorithm finds any given a-mincut with probability W (n-2a) • Proof: just add a factor to basic analysis • Corollary: O(n2a)a-mincuts • Corollary: Can find all in O*(n2a) time • Just change contraction factor in RCA

  31. Summary • A simple fast min-cut algorithm • Random selection avoids rare problems • Generalization to near-minimum cuts • Bound on number of small cuts • Probabilistic method, backwards

  32. Network Reliability Monte Carlo estimation

  33. The Problem • Input: • Graph G with n vertices • Edge failure probabilities • For simplicity, fix a single p • Output: • FAIL(p): probability G is disconnected by edge failures

  34. Approximation Algorithms • Computing FAIL(p) is #P complete [V] • Exact algorithm seems unlikely • Approximation scheme • Given G, p, e, outputs e-approximation • May be randomized: • succeed with high probability • Fully polynomial (FPRAS) if runtime is polynomial in n, 1/e

  35. Monte Carlo Simulation • Flip a coin for each edge, test graph • k failures in t trials ÞFAIL(p) »k/t • E[k/t]= FAIL(p) • How many trials needed for confidence? • “bad luck” on trials can yield bad estimate • clearly need at least 1/FAIL(p) • Chernoff bound:O*(1/e2FAIL(p)) suffice to give probable accuracy within e • Time O*(m/e2FAIL(p))

  36. Chernoff Bound • Random variables Xi` [0,1] • Sum X = å Xi • Bound deviation from expectation Pr[ |X-E[X]|me E[X] ] < exp(-e2E[X]/4) • If E[X] m 4(log n)/e2, “tight concentration” • Deviation by eprobability < 1 / n • No one variable is a big part of E[X]

  37. Application • Let Xi=1 if trial i is a failure, else 0 • Let X = X1 + … + Xt • Then E[X] = t FAIL(p) • Chernoff says X within relative e of E[X] with probability 1-exp(e2 t FAIL(p)/4) • So choose t to cancel other terms • “High probability” t =O(log n / e2FAIL(p)) • Deviation by eprobability < 1 / n

  38. Review • Contraction Algorithm • O(n2a)a-mincuts • Enumerate in O*(n2a) time

  39. Network reliability problem • Random edge failures • Estimate FAIL(p) = Pr[graph disconnects] • Naïve Monte Carlo simulation • Chernoff bound---“tight concentration” Pr[ |X-E[X]|me E[X] ] < exp(-e2E[X]/4) • O(log n /e2FAIL(p)) trials expect O(log n /e2) network failures---good for Chernoff • So estimate within e in O*(m/e2FAIL(p)) time

  40. Rare Events • When FAIL(p) too small, takes too long to collect sufficient statistics • Solution: skew trials to make interesting event more likely • But in a way that let’s you recover original probability

  41. DNF Counting • Given DNF formula (OR of ANDs) (e1Ùe2Ùe3) Ú (e1Ù e4) Ú (e2Ùe6) • Each variable set true with probability p • Estimate Pr[formula true] • #P-complete • [KL, KLM] FPRAS • Skew to make true outcomes “common” • Time linear in formula size

  42. Rewrite problem • Assume p=1/2 • Count satisfying assignments • “Satisfaction matrix • Sij=1 if ith assignment satisfies jth clause • We want number of nonzero rows • Randomly sampling rows won’t work • Might be too few nonzeros

  43. New sample space • So normalize every nonzero row to sum to one (divide by number of nonzeros) • Now sum of nonzeros is desired value • So sufficient to estimate average nonzero

  44. Sampling Nonzeros • We know number of nonzeros/column • If satisfy given clause, all variables in clause must be true • All other variables unconstrained • Estimate average by random sampling • Know number of nonzeros/column • So can pick random column • Then pick random true-for-column assignment

  45. Few Samples Needed • Suppose k clauses • Then E[sample] > 1/k • 1£ satisfied clauses £k • 1³ sample value ³1/k • Adding O(k log n /e2) samples gives “large” mean • So Chernoff says sample mean is probably good estimate

  46. Reliability Connection • Reliability as DNF counting: • Variable per edge, true if edge fails • Cut fails if all edges do (AND of edge vars) • Graph fails if some cut does (OR of cuts) • FAIL(p)=Pr[formula true] Problem: the DNF has 2n clauses

  47. Focus on Small Cuts • Fact: FAIL(p) > pc • Theorem: if pc=1/n(2+d) thenPr[>a-mincut fails]< n-ad • Corollary: FAIL(p) » Pr[£ a-mincut fails], where a=1+2/d • Recall: O(n2a)a-mincuts • Enumerate with RCA, run DNF counting

  48. Proof of Theorem • Given pc=1/n(2+d) • At most n2acuts have value ac • Each fails with probability pac=1/na(2+d) • Pr[any cut of value ac fails] = O(n-ad) • Sum over alla > 1

  49. Algorithm • RCA can enumerate all a-minimum cuts with high probability in O(n2a)time. • Given a-minimum cuts, can e-estimate probability one fails via Monte Carlo simulation for DNF-counting (formula size O(n2a)) • Corollary: when FAIL(p)< n-(2+d), can e-approximate it in O (cn2+4/d) time

More Related