1 / 22

On Spatial-Range Closest Pair Query

On Spatial-Range Closest Pair Query. Jing Shan , Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University. Outline. Problem Definition Straightforward Approach Existing Technique Our Method Performance. Problem Definition.

jela
Télécharger la présentation

On Spatial-Range Closest Pair Query

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University

  2. Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece

  3. Problem Definition • Given a spatial data set S, the Range Closest Pair query regarding a spatial range R finds a pair of objects (s1, s2) with s1 and s2 R such that the distance between s1 and s2 is the smallest distance between two objects inside range R. j R Query result is (e, f). SSTD03 --- Santorini, Greece

  4. Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece

  5. Straightforward Approach • Use an R-tree to select the objects in the query range. • Find the closest pair by checking objects in the selection result. • We could do nested-loop; • Or better approaches e.g. plane sweep with Voronoi diagram method is O(n log n). • Problems: Have to access all data pages of R-tree which intersect the query range. Query range data may not fit in memory SSTD03 --- Santorini, Greece

  6. Note on Existing Techniques • [Hjaltason and Samet 98]: incremental join. • [Corral, Manolopoulos, Theodoridis and Vassilakopoulos 00]: an improved version, using pruning. • They addressed a slightly different problem: • No query range. • Joining two different R-trees. • Existing techniques do not perform well if there is overlap between the two R-trees. In case the two R-trees are identical, there is extensive overlap. SSTD03 --- Santorini, Greece

  7. MinDist • Given two MBRs A, B of R-tree nodes, MinDist(A, B)isthe smallest distance between A and B boundaries. •  object o1  A and o2B, distance(o1, o2)  MinDist(A, B). MinDist A B SSTD03 --- Santorini, Greece

  8. Existing Technique • T=; closestpair=NULL. • Push the pair of root entries into priority queue Q. • While Q is not empty • Pop (e1, e2) from Q whose MinDist is the smallest. • If e1 points to an index node, For every child entry se1 in Node(e1) and child entry se2 in Node(e2) If MinDist(se1, se2)<T, push (se1, se2) into Q. • Else /* e1 point a leaf node */ For every object o1 in Node(e1) and object o2 in Node(e2) If distance(o1, o2)<T, update T=distance(o1,o2) and closestpair=(o1,o2) and remove pairs from Q with MinDist no smaller than T. SSTD03 --- Santorini, Greece

  9. R A A B C D D C B a,b f,i c,e,g d,h Example T = ; closestpair=NULL (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece

  10. R A A B C D D C B a,b f,i c,e,g d,h Example T = distance(a, b); closestpair=(a, b) (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece

  11. R A A B C D D C B a,b f,i c,e,g d,h Example T = distance(f, e); closestpair=(f, e) (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece

  12. MinExistDist • Given two MBRs A, B of R-tree nodes, MinExistDist(A, B)isthe minimum distance which guarantees that there exists a pair of objects, one in A and the other in B, with distance closer than the metric. •  object o1  A and o2B, distance(o1, o2)  MinExistDist(A, B). • Usage [CMT+00]: if MinExistDist(A, B) is smaller than T, update T. This can increase the chance of eliminating pairs from Q at early time. MinDist A B MinExistDist SSTD03 --- Santorini, Greece

  13. MinDist MinExistDist Involving a Query Range • We extend the MinExistDist… MinDist MinExistDist = ∞ SSTD03 --- Santorini, Greece

  14. Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece

  15. Motivation for Our Method • The existing technique joins all self-pairs, e.g. (A,A), (B,B), … • Reason: the MinDist of any self pair is 0. • Challenge: is it possible to make it non-zero? If MinDist(A,A)  T, no need to process (A,A) ! • We propose two ways to augment the R-tree with additional information. We call the augmented structures the Self-Range Closest-Pair Tree. In short, SRCP-tree. SSTD03 --- Santorini, Greece

  16. SRCP-tree (version 1) • Along with each index entry, store the closest pair of objects in the sub-tree. • Check the closest pair stored along with the root entry. If both objects are inside the query range R, return. • Along with each self pair to be pushed into Q, use the distance of the local closest pair (rather than 0) as the MinDist. • If we encounter an index entry where both objects in the closest pair are inside R, compare their distance with T. May decrease T. SSTD03 --- Santorini, Greece

  17. At each such entry, let the original local closest pair be (a,b). Needs to updated only if distance(o, o’) < distance (a,b) for some object o’ in the sub-tree. distance (a,b) (a,b) o o Insertion • When a new object o is inserted, only need to update the augmented information along the insertion path. (But need to visit subtrees.) SSTD03 --- Santorini, Greece

  18. SRCP-tree (version 2) • Idea: while version 1 tries to avoid processing self pairs, version 2 of the structure tries to avoid processing sibling pairs. • E.g. if R has children A, B, C, D, version 1 cannot avoid pair (A,B), unless MinDist(A,B) T. Similarly, it has to process (A,C), (A,D), (B,C), (B,D), (C,D). • In version 2, every index entry e stores the “local-parent closest pair”: the closest pair between an object in the sub-tree pointed by e and an object in the sub-tree pointed by Parent(e). • E.g. along with A, we store the closest pair of objects (o1, o2), where o1 is in subtree(A) and o2 is in subtree(R). • Now, if the distance of object pair stored at A is no smaller than T, no need to process any pair involving A. Namely, (A,A), (A,B), (A,C), (A,D). SSTD03 --- Santorini, Greece

  19. Performance • Dell Pentium 4, 2.66GHz CPU • XXL library, Java • Both synthetic and real data: • uniform data (80,000 objects) • US National Mapping Information (26,700 Massachusetts sites) URL = http://mappings. usgs.gov/www/gnis/ • Focus on query time. SSTD03 --- Santorini, Greece

  20. Small Query Range SSTD03 --- Santorini, Greece

  21. Large Query Range SSTD03 --- Santorini, Greece

  22. Conclusions • We have addressed the spatial closest pair query with query range. • We have proposed two versions of an index structure called SRCP-tree. • Our approaches have much better query performance than the existing techniques, especially when the query range is large. • In particular, version 2 of the SRCP-tree is universally the best. SSTD03 --- Santorini, Greece

More Related