1 / 35

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments. Xiao Zhang 1 , Wang-Chien Lee 1 , Prasenjit Mitra 1, 2 , Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University

vivek
Télécharger la présentation

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University 3 School of Information Systems, Singapore Management University EDBT, Nantes, France, 03/28/2008

  2. Outline • Background • Problem Analysis • New TNN Algorithms • Optimization • Experiments • Conclusions & Future Work

  3. Background – TNN • What is TNN? • S is a set of banks • R is a set of restaurants • TNN distance = 5+1 = 6

  4. Background – TNN • What is TNN? • Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’) where dis(p,s) is the Euclidean distance between p and s. • First proposed by Zheng, Lee and Lee [1]. [1] B. Zheng, K.C.Lee and W.-C.Lee. Transitive nearest neighbor search in mobile environments. SUTC 2006

  5. Background - broadcast • Server has all the data and broadcasts data in forms of radio signals in channels. • Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries. • Broadcast VS. on-demand • Support an arbitrary number of mobile devices to have simultaneous access • Efficient use of limited bandwidth • Light workload on the server side

  6. Background - motivation • Assumption: • Zheng, Lee and Lee assumed a single broadcast channel. • Based on existing technology (dual-mode, dual-standby cell phone), we assume multiple channels. • A mobile client can access information in multiple channels simultaneously • Challenges: • How to utilize the parallel processing ability of mobile clients to facilitate query processing? • How to reduce access time? • How to reduce energy consumption?

  7. Our contributions: • 1. We developed two new algorithms for TNN query in multi-channel access environment. • 2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost. • 3. We proposed an optimization technique to reduce energy consumption.

  8. Background – settings • 1. Two broadcast channels, for S and R • 2. 2-dim points • 3. Air-indexing: R-tree[2] • 4. Broadcast in depth-first order, in order to avoid back-tracking • 5. (1, m) interleaving [3] • 6. performance metrics (in # of pages): • Access time • Tune-in time [2] A. Guttman. R-trees: a dynamic index structure for spatial searching. inSigmod’84 [3] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

  9. Problem Analysis • Randomly choose ANY pair of objects (s’, r’ ), use the trans. dist. as a search range • Guarantee to enclose the answer pair (s, r)

  10. Problem Analysis • Theorem[1]: • the transitive distance determined by any pair of objects (s, r) is an upper bound. • General ideas of answering TNN queries: • Estimate: find a search range from the query point p by searching the index • Filter: filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

  11. Problem Analysis • Deficiencies of existing algorithms: • Approximate-TNN-Search: • Uses an equation to estimate the search range in the first step • Search range may be too large or too small • Window-Based-TNN-Search: • Two sequential NN searches in estimation step • Search range estimation is done in sequential order • Large access time

  12. New TNN algorithms – algo1 • Algo 1: Double-NN-Search • Issue two NN queries in estimation step • p’s NN in S, and p’s NN in R • (s1, r2)

  13. New TNN Algorithms – algo2 • Hybrid-NN-Search • Increases interaction between two channels • Uses result of the finished NN to guide the unfinished NN in order to reduce search range • Uses new distance metrics to perform branch-and-bound • Treat TNN distance as a whole

  14. New TNN Algorithms – algo 2 • NN in Channel 1 finishes first • Already found s=p.NN(S) • Looking for r2, instead of r1

  15. New TNN Algorithms – algo 2 • NN in channel 2 finishes first • Already found r=p.NN(R) • Looking for s2 instead of s1 • Use new criteria when searching the index • Need new distance metrics for branch&bound

  16. New TNN Algorithms – algo 2 • MinTransDist: • Lower bound for trans. dist. from p to an MBR to r. • MinMaxTransDist: • Upper bound for trans. dist. from p to an MBR to r. • Details given in the paper.

  17. New TNN Algorithms - Hybird • Algorithm description: • If the two NN searches in both channels are not finished, follow the Double-NN algorithm • If the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R. • If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

  18. New TNN Algorithms - Hybrid • Updating and pruning strategy • Use queue to keep potential MBRs, sorted based on their arrival time • Case 2 (s=p.NN(S) finishes first): • Switch NN query point to the s • Initial upper bound update • If there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ ) • Scan the queue of MBRs and use dist. metr. in traditional NN queries.

  19. New TNN Algorithms - Hybrid • Updating and pruning strategy (cont.) • Case 3 (r=p.NN(R) finishes first): • If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper bound • Then scan all the MBRs in the queue, use z=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound. • In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

  20. New TNN Algorithms - Hybrid • Example for pruning:

  21. Optimization • Goal: reduce energy consumption • Analysis: • Previous algorithms minimize the search range in the Estimate Step by issuing “exact” search • Energy consumption in Filter Step is low • Energy consumption in Estimate Step is high • Approach: • use “approximate” search in Estimate Step to save energy in this step

  22. Optimization • Approximate Search: • Relax the pruning condition • Use ratio of overlapping area to estimate the probability • Compare the ratio with a threshold α

  23. Optimization • How to determine α? • factors: • R-tree height and node depth • Use small α on the root and large α on leaves • Difference in densities of the two datasets involved • Small α or 0on the dataset with smaller density exact search approximate search 0 α 1

  24. Performance Evaluation - settings • Dataset 1: • 39,000 * 39,000 square region • Densities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-4.6, 10-4.2 • # of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969 • Dataset 2: • 39,000 * 39,000 square region • # of points: 2,000 – 30,000 with 2,000 increment

  25. Performance Evaluation - settings • R-tree as air index • Broadcast in depth-first order • STR packing algorithm [3] • (1, m) interleaving [2] • 1,000 query points generated for each of the experiments [3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing. ICDE 1997 [2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

  26. Performance Evaluation • Algorithms with exact search: • Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window-Based • 1.8≥ size(S) / size(R) ≥ 1 / 40

  27. Performance Evaluation • Algorithms with exact search: • Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time

  28. Performance Evaluation • ANN vs. eNN • Improvement in tune-in time ranges from 11%-20%

  29. Performance Evaluation • Hybrid algorithm with ANN:

  30. Conclusions • Double-NN and Hybrid-NN effectively reduce access time • Cases in which our algorithms reduces tune-in time are stated and discussed • Optimization technique effectively reduces tune-in time of all three algorithms

  31. Future Work • Generalized TNN queries in broadcast environment: • More than 2 datasets are involved • Visiting order not specified • Complete route query • Using new distance metrics in disk based environment

  32. Thank you! • Any questions?

  33. New TNN Algorithms – distance metrics (backup slides) • Def 1: (MinTransDist) • Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)

  34. New TNN Algorithms – distance metrics (backup slides) • Def 2: (MaxDist) • Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓ • MaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r. ℓ r p

  35. New TNN Algorithms – distance metrics (backup slides) • Def 3: (MinMaxTransDist) • Given two points p and r, and an MBR MS, MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MS • Lemma: • Given a starting point p, an ending point r, and an MBR MSenclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)

More Related