190 likes | 295 Vues
This paper discusses range searching in external memory, focusing on efficient range reporting and aggregation. It covers various data structures and optimizations for 1D and 2D range queries in the External Memory Model. The BAR-Tree is highlighted for orthogonal rectangle ranges, along with I/O analysis for query efficiency and construction updates.
E N D
Approximate Range Searching in External Memory MichaStreppel TU Eindhoven NCIM-Groep, the Netherlands and Ke Yi AT&T Labs, USA HKUST, Hong Kong
Range Searching • A set S of N points in Rd • Build a data structure such that given a query range Q, S ∩ Q can be returned efficiently Q focus on range reporting, range aggregation in paper
External Memory Model Memory size: M I/O block size: B Disk size: infinite
Range Searching in External Memory • 1D:B-tree • Size: O(N/B), Query: O(logB(N/B)+k/B)) • 2D: • Half planes [Agarwal et al. 2000] • Size: O(N/B), Query: O(logB(N/B)+k/B)) • Orthogonal rectangles [Arge et al. 1999] • Size: O(N/B), Query: Θ((N/B)ε+k/B) • Query: O(logB(N/B)+k/B)) , Size: Θ((N/B) log(N/B)/loglogBN) • kdB-tree [Robinson 1981] • Size O(N/B), Query: O((N/B)½ + k/B) Q Q Q Exact range searching is difficult!
Approximate Range Searching radius = ε · diam(Q) • Internal memory: • BBD-tree [Arya and Mount, 1995] • BAR-tree [Duncan et al. 2001] • Size: O(N), Query: O(log(N) + 1/ε + kε) for any convex Q • External memory: this paper! Q
Externalization previously Query bounds of linear structures in internal/external memory
Externalizing the kd-tree B = 3 Internal memory: O(N½ + k) External memory: O((N/B)½ + k/B) for orthogonal rectangle ranges
The BAR-Tree [Duncan et al. 2001] • A space-partitioning scheme • Similar to kd-tree • But also use diagonal cuts • All cells are convex and fat • Some cuts have to be unbalanced • But no two consecutive unbalanced cuts • Height: O(log N) • Query range intersects O(log(N) + 1/ε + kε) cells(any convex range)
Blocking the BAR-Tree • Top-down blocking • Rules for u: • Check u’s two subtreesT1, T2 • Add u if both have≥ B/2 nodes • If T1 small, check if entire T1 fits • then add T1 • else do not add u • Not possible for both T1 and T2 to be small B = 8
Blocking the BAR-Tree Any subtreeTu is stored in O(|Tu|/B+1) blocks
I/O Analysis of a Query organized in O(1/ε)subtrees Qε Q nodes completelyinside Qε nodes intersectsboth Q and ∂Qε total #: O(kε) total #: O(1/ε) total I/O: O(1/ε) total I/O: O(1/ε + kε/B)
I/O Analysis of a Query There are O(log N) such nodes, but we would like O(logBN) I/Os
Current Blocking Not Sufficient size = B/2 − 1
Regrouping Shallow Subtrees • Identify shallow nodes top-down • u is shallow if there is a path of length log(B) beneath u is stored in more than c blocks • For such a u • Do a BFS for log(B) levels • Move these nodes from their original blocks to a new block size = B/2 − 1 Achieving the desired query I/O: O(logB(N/B) + 1/ε + kε/B)
Construction and Update • Construction: O(N/B ·logM/B(N/B)) I/Os • Same as sorting • Insertions and deletions • Use partial rebuilding • O(logBN + 1/B · logM/B(N/B)log(N/B)) I/Os amortized
Extension to Objects • S: a collection of objects • The density of S is the smallest number λ such that any ball b is intersected by at most λ objects o in S with radius(o) ≥ radius(b) [de Berg et al. 1997] low density high density high density
Extension to Objects • The object-BAR-tree (using guarding sets [de Berg et al. 2003]) • Size: O(λN/B) • Query: O(logB(N/B) + λ/B·1/ε + λ·kε/B) • Construction: O(λN/B · logM/B(N/B)) low density high density high density
Remarks • Extends to d dimensions • Query becomes O(logB(N/B) + 1/εd-1 + kε/B) • Non-convex query ranges • Query becomes O(logB(N/B) + 1/εd+ kε/B) • Construction and query process does not depend on ε • The actual cost isO(logB(N/B) + minε{1/εd-1 + kε/B}) • Open problems • How to update the object-BAR-tree efficiently?
The END Thank you!