510 likes | 663 Vues
This paper presents a randomized algorithm that operates in linear time for finding minimum spanning trees (MST). It includes the fundamentals of MST properties and definitions, as well as a comprehensive analysis of the proposed algorithm. The algorithm leverages graph cycles and cuts to eliminate candidates that cannot belong to the MST, optimizing the selection of edges through random sampling. We detail the algorithm's steps and discuss its performance and correctness, contributing to the advancement of efficient algorithms in graph theory.
E N D
A Randomized Linear-Time Algorithm to Find Minimum Spaning Trees 黃則翰 R96922141蘇承祖 R96922077張紘睿 R96922136許智程 D95922022戴于晉 R96922171 David R. Karger Philip N. KleinRobert E. Tarjan
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Introduction • [Borůvka 1962] O(m log n) • Gabow et al.[1984] O(m log β(m,n) ) • β(m,n)= min { i |log(i)n <= m/n} • Verification algorithm • King[1993] O(m) • A randomize algorithm runs in O(m) time with high probability
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Cycle property • For any cycle C in a graph, the heaviest edge in C dose not appear in the minimum spanning forest. 3 3 2 2 5 5 6 6
Cut Property • For any proper nonempty subset X of the vertices, the lightest edge with exactly one endpoint in X belongs to the minimum spanning tree X 5 2 3 6
Definition • Let G be a graph with weighted edges. • w(x,y) • The weight of edge {x,y} • If F is a forest of a subgraph in G • F(x, y) • the path (if any) connecting x and y in F • wF(x, y) • the maximum weight of an edge on F(x, y) • wF(x, y)=∞ • If x and y are not connected in F
F-heavy & F-light • An edge {x,y} is F-heavy if w(x,y) > wF(x,y) and F-light otherwise • Edge of F are all F-light 3 3 E A C G W(C,D)=5WF(C,D)=5F-light 2 2 5 5 D H B F 6 6 W(B,D)=6WF(B,D)=max{2,3,5}F-heavy W(F,H)=6WF(F,H)= ∞ F-light
Observation • No F-heavy edge can be in the minimum spanning forest of G (cycle property) • Discard edge that cannot be in the minimum spanning tree • F-light edge can be the candidate edge for the minimum spanning tree of G
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Boruvka Algorithm • For each vertex, select the minimum-weight edge incident to the vertex. • Replace by a single vertex each connected component defined by the selected edges. • Delete all resulting isolated vertices, loops, and all but the lowest-weight edge among each set of multiple edges.
Algorithm Step1 • Apply two successive Boruvka steps to the graph, thereby reducing the number of vertices by at least a factor of four.
Algorithm Step2 • Choose a subgraph H by selecting each edge independently with probability ½. • Apply the algorithm recursively to H, producing a minimum spanning forest F of H. • Find all the F-heavy edges and delete them from the contracted graph.
Algorithm Step3 • Apply the algorithm recursively to the remaining graph to compute a spanning forest F’. Return those edges contracted in Step1 together with the edges of F’.
Original Problem G Boruvka × 2 G* F’ Sample with p=0.5 Left Sub-problem H G’ Right Sub-problem Return minimum forest F of H Delete F-heavy edges from G*
Correctness • By the cut property, every edge contracted during Step1 is in the MSF. • By the cycle property, the edges deleted in Step2 do NOT belong to the MSF. • By the induction hypothesis, the MSF of the remaining graph is correctly determined in the recursive call of Step3.
Candidate Edge of MST • The expected number of F-light edges in G is at most n/p (negative binomial) • For every sample graph H, the expected candidate edge for MST in G is at most n/p (F-light edge)
Random-sampling • To help discard some edge that cannot be in the minimum spanning tree • Construct the sample graph H • Process the edges in increasing order • To process an edge e • 1. Test whether both endpoints of e in same component • 2. Include the edge in H with probability p • 3. If e is in H and is F-light, add e to the Forest F
Random-sampling W(E,G)=14WF(E,G)=max{5,6,9,13}F-heavy W(E,F)=11WF(E,F)=max{5,6,9}F-heavy G H 4 4 14 14 10 10 5 5 7 C E C E 13 13 3 3 7 B 6 6 A G A G 11 11 B W(D,F)=9WF(D,F)=9F-light F F F D D 9 9 W(A,B)=7WF(A,B)=∞F-light
Random-sampling • Increasing Order • If F-light • Throw • If • Select • Else • Throw • Don’t select G 5 C E 4 14 6 10 A G 11 13 3 F D 9 F 7 B Random select edges to H Find F of H G 5 C E 4 14 6 10 A G 11 13 3 F D 9 7 B
Observation • No F-heavy edge can be in the minimum spanning forest of G (cycle property) • F-light edge can be the candidate edge for the minimum spanning tree of G • The forest F produced is the forest that would be produced by Kruskal and inlcude all possible MSF of G
Observation • The size of F is at most n-1 • The expected number of F-light edges in G is at most n/p (negative binomial) Mean k = Expected n =
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
Running time Analysis • Total running time= running time in each steps. • Step(1): 2 steps Boruvka’s algorithm • Step(2):Dixon-Rauch-Tarjan verification algorithm. • All takes linear time to the number of edges. • Estimate the total number of edges.
Observe the recursion tree • G=(V,E) |V| = n, |E|=m . • m≧n/2 since there is no isolate vertices. • Each problem generates at most 2 subproblems. • At depth d, there is at most 2d nodes. • Each node in depth d has at most n/4d vertices. • The depth d is at most log4n. • There are at most vertices in all subproblems
The worst case • Theorem 4.1 The worst-case running time of the minimum-spanning-forest algorithm is O(min{n2,m log n}), the same as the bound for Boruvka’s algorithm. • Proof: There is two different estimate ways. • A subproblem at depth contains at most (n/4d)2/2 edges. • Total edges in all subproblems is:
The worst case 2. Consider a subprolbem G=(V,E) after step(1), we have a G’=(V’ ,E’),|E’|≦|E| - |V|/2, |V’| ≦|V|/4 Edges in left-child = |H| Edges in right-child ≦ |E’| - |H| + |F| so edges in two subproblem is less then: (|H|) + (|E’| - |H| + |F|) =|E’| +|F|≦|E|-|V|/2 + |V|/4≦|E| The two sub problem at most contains |E| edges.
The worst case m edges
The worst case • The depth is at most log4n and each level has at most m edges, so there are at most (m log n) edges. • The worst-case running time of the minimum-spanning-forest algorithm is O(min{n2,m log n}).
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
Analysis – Average Case (1/8) • Theorem: the expected running time of the minimum spanning forest algorithm is O(m) • Calculating the expected total number of edges for all left path problems Original Problem Right Sub-problem Left Sub-problem Left Subsub-problem Right Subsub-problem
Analysis – Average Case (2/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges • Evaluating the total edge number for all right sub-problems # of edges = m’ Expected total edge number ≤2m’
Analysis – Average Case (3/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges 1. E[edge number of H] = 0.5 × edge number of G* Original Problem G 2. ∵ Boruvka × 2 ∴ edge number of G* ≤ edge number of G Boruvka × 2 G* Sample with p=0.5 E[edge number of H] ≤ 0.5 × edge number of G H G’ Right Sub-problem Left Sub-problem
Analysis – Average Case (4/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges Original Problem G Boruvka × 2 G* # of edges = m’ Sample with p=0.5 E[edge number of H] ≤ 0.5 × edge number of G H G’ # of edges ≤ 0.5 × m’ Right Sub-problem Left Sub-problem Expected total edge number ≤ = 2m’
Analysis – Average Case (5/8) • Calculating the expected total edge number for one left pathL started at one problem with m’ edges • Expected total edge number on L≤2m’ K.O. • Evaluating the total edge number of all right sub-problems • E[total edges of all right sub-problem] ≤ n
Analysis – Average Case (6/8) • Evaluating the total edge number for all right sub-problems • To prove :E[total edges of all right sub-problem] ≤ n 1. ∵ Boruvka × 2 ∴ vertex number of G* ≤ 0.25 × vertex number of G Original Problem G Boruvka × 2 2. Based on lemma 2.1: E[edge number of G’]≤2 × vertex number of G* G* Sample with p=0.5 E[edge number of G’]≤ 0.5×vertex number of G H G’ Right Sub-problem Left Sub-problem Return minimum forest F of H Delete F-heavy edges from G*
Analysis – Average Case (7/8) • Evaluating the total edge number for all right sub-problems • To prove :E[total edges of all right sub-problem] ≤ n Original Problem = n G Boruvka × 2 G* # of vertices of original-problems=n Sample with p=0.5 # of vertices of sub-problems ≤ 2×n/4 # of edges of right sub-problems≤n/2 E[edge number of G’]≤ 0.5×vertex number of G H G’ # of vertices of sub-problems ≤ 4×n/42 # of edges of right sub-problems≤2×n/8 Right Sub-problem Left Sub-problem # of edges of right sub-problems≤4×n/(42×2) # of vertices of sub-problems ≤ 8×n/43 # of edges of right sub-problems≤8×n/(43×2) # of vertices of sub-problems ≤ 16×n/44
Analysis – Average Case (8/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges • Expected total edge number for one left path ≤2m’ • Evaluating the total edge number for all right sub-problems • E[total edges of all right sub-problem] ≤ n # of edges = m’ E[processed edges in the original problem and all sub-problems] =2×(m+n) Expected total edge number ≤2m’
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
The Probability of Linearity • Theorem 4.3 • The minimum spanning forest algorithm runs in Ο(m) time with probability 1 – exp(-Ω(m))
The Probability of Linearity Chernoff Bound: Given xi as i.d.d. random variables and 0< i n, and X is the sum of all xi, for t > 0, we have Thus, the probability that less than s successes (each with chance p) within k trials is
The Probability of Linearity • Right Subproblems • At most the number of vertices in all right subproblems: n/2 ( proved by theorem 4.2 ) • n/2 is the upper bound on the total number of heads in nickel-flips
Right Subproblems • The probability • It occurs fewer than n/2 heads in a sequence of 3m nickel-tosses • m + n ≦ 3m since n/2 ≦ m • The probability is exp (-Ω(m)) by a Chernoff bound
The Probability of Linearity • Left Subproblem • Sequence: every sequence ends up with a tail, that is, HH…HHT • The number of occurrences of tails is at most the number of sequences • Assume that there are at most m’ edges in the root problem and in all right subproblems
Left Subproblems The probability • It occurs m’ tails in a sequence of more than 3m’ coin-tosses • The probability is exp (-Ω(m)) by a Chernoff bound
The Probability of Linearity • Combining Right & Left Subproblems • The total number of edges is Ο(m) with a high-probability bound 1 – exp(-Ω(m))