290 likes | 383 Vues
This algorithm solves the Union-Find Problem efficiently in the I/O model, with applications to terrain analysis. It provides optimal worst-case complexity and practical implementation for batched operations. The algorithm is also applied to tasks such as topological persistence and contour trees. A two-stage approach is employed for efficient computation, transforming the data structure into an interval union-find problem. Weighted paths, Euler tours, and in-order traversals are utilized for various operations. The algorithm handles redundant unions through strategies like minimum spanning trees and deterministic processing. Applications in topological persistence and contour trees leverage the batched Union-Find structure to manage triangulated meshes, minimum-saddle pairs, and node connectivity. Advanced techniques like join trees and split trees enhance the algorithm's capabilities for analyzing complex terrains.
E N D
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus
The Union-Find Problem • A universe of N elements: x1, x2, …, xN • Initially N singleton sets: {x1}, {x2 }, …, {xN} • Each set has a representative • Maintain the partition under • Union(xi, xj) : Joins the sets containing xi and xj • Find(xi) : Returns the representative of the set containing xi
The Solution representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a b j a m path compression link-by-rank e g e g n
Complexity • O(N α(N)) for a sequence of N union and find operations [Tarjan 75] • α(•) : Inverse Ackermann function (very slow!) • Optimal in the worst case [Tarjan79, Fredman and Saks 89] • Batched (Off-line) version • Entire sequence known in advance • Can be improved to linear on RAM [Gabow and Tarjan 85] • Not possible on a pointer machine [Tarjan79]
Simple and Good, as long as … The entire data structure fits in memory
The I/O Model Main memory of size M One I/O transfers B items between memory and disk Disk of infinite size
Our Results • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • Same as sorting • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Implemented • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Implemented • Contour trees : O(sort(N)) I/Os
I/O-Efficient Batched Union-Find • Assumption: No redundant unions • Each union must join two different sets • Will remove later • Two-stage algorithm • Convert to interval union-find • Compute an order on the elements s.t. each union joins two adjacent sets • Solve batched interval union-find
Union Graph (Tree if no redundant unions) 1: Union(d, g) 2: Union(a, c) 3: Union(r, b) 4: Union(a, e) 5: Union(e, i) 6: Union(r, a) 7: Union(a, d) g 8: Union(d, h) r 9: Union(b, f) r r 9 3 6 6 3 f a b a b 4 4 2 9 2 7 7 c d e f c d e 1 8 5 1 5 g h i g i 8 h Equivalent union trees
Transforming the Union Tree r r r 7 3 3 3 6 6 6 8 8 a b a h b d a h b 4 2 9 2 9 9 4 4 7 7 1 2 c d e f c d e f g c e f 1 8 5 1 5 5 i g h i g i r 7 9 6 3 8 d a h b f Weights along root-to-leaf path decrease 1 2 4 5 g c e i
Formulating as a Batched Problem r 3 6 a b r 7 4 9 2 9 6 3 7 8 d a h b f c d e f 1 2 1 8 5 4 5 g c e i g h i For each edge, find the lowest ancestor edgewith a higher weight
Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 Euler Tour x: weight y: positions in the tour In O(sort(N)) I/Os [Chiang et al. 95]
Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 For each edge, find the lowestancestor edgewith a higher weight For each segment, find the shortest segment above and containing it (can be solved in O(sort(N)) I/Os)
In-Order Traversal r 3 9 6 Weights along root-to-leaf path decrease 7 8 b a d h f 1 2 4 5 c e i g • At u, with child u1,…, uk(in increasing order of weight) • Recursively visit subtree at u1 • Return u • For i=2 ,…, kRecursively visit subtree at ui b r c a e i g d h f Claim: this traversalproduces the right order
Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp Four instances of batched ray shooting: O(sort(N))
Handling Redundant Unions • Compute the minimum spanning tree • O(sort(N)) I/Os (randomized) [Chiang et al. 95] O(sort(N) loglog B) I/Os (deterministic) [Arge et al. 04] • Deterministic O(sort(N)) I/Os if graph is planar • Only MST edges are non-redundant
Applications Topological Persistence Contour Trees
Formulated as Batched Union-Find • Represented as a triangulated mesh • Consider minimum-saddle pairs • When reach • A minimum or maximum: do nothing • A regular poin u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) lower link
Previous Results • Directly maintain contours • O(N log N) time [van Kreveld et al. 97] • Needs union-split-find for circular lists • Do not extend to higher dimensions • Two sweeps by maintaining components, then merge • O(N log N) time [Carr et al. 03] • Extend to arbitrary dimensions
Join Tree and Split Tree Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree
Final Contour Tree Hard to BATCH! 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 9 9 Now can BATCH! 8 8 8 u 7 7 u 7 u 6 6 6 v u v 5 5 5 w w w 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
Summary • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Contour trees : O(sort(N)) I/Os • Open Question: On-line case • Can we get below O(N α(N)) I/Os?