350 likes | 458 Vues
Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments. Xiao Zhang 1 , Wang-Chien Lee 1 , Prasenjit Mitra 1, 2 , Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University
Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University 3 School of Information Systems, Singapore Management University EDBT, Nantes, France, 03/28/2008
Outline • Background • Problem Analysis • New TNN Algorithms • Optimization • Experiments • Conclusions & Future Work
Background – TNN • What is TNN? • S is a set of banks • R is a set of restaurants • TNN distance = 5+1 = 6
Background – TNN • What is TNN? • Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’) where dis(p,s) is the Euclidean distance between p and s. • First proposed by Zheng, Lee and Lee [1]. [1] B. Zheng, K.C.Lee and W.-C.Lee. Transitive nearest neighbor search in mobile environments. SUTC 2006
Background - broadcast • Server has all the data and broadcasts data in forms of radio signals in channels. • Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries. • Broadcast VS. on-demand • Support an arbitrary number of mobile devices to have simultaneous access • Efficient use of limited bandwidth • Light workload on the server side
Background - motivation • Assumption: • Zheng, Lee and Lee assumed a single broadcast channel. • Based on existing technology (dual-mode, dual-standby cell phone), we assume multiple channels. • A mobile client can access information in multiple channels simultaneously • Challenges: • How to utilize the parallel processing ability of mobile clients to facilitate query processing? • How to reduce access time? • How to reduce energy consumption?
Our contributions: • 1. We developed two new algorithms for TNN query in multi-channel access environment. • 2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost. • 3. We proposed an optimization technique to reduce energy consumption.
Background – settings • 1. Two broadcast channels, for S and R • 2. 2-dim points • 3. Air-indexing: R-tree[2] • 4. Broadcast in depth-first order, in order to avoid back-tracking • 5. (1, m) interleaving [3] • 6. performance metrics (in # of pages): • Access time • Tune-in time [2] A. Guttman. R-trees: a dynamic index structure for spatial searching. inSigmod’84 [3] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997
Problem Analysis • Randomly choose ANY pair of objects (s’, r’ ), use the trans. dist. as a search range • Guarantee to enclose the answer pair (s, r)
Problem Analysis • Theorem[1]: • the transitive distance determined by any pair of objects (s, r) is an upper bound. • General ideas of answering TNN queries: • Estimate: find a search range from the query point p by searching the index • Filter: filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.
Problem Analysis • Deficiencies of existing algorithms: • Approximate-TNN-Search: • Uses an equation to estimate the search range in the first step • Search range may be too large or too small • Window-Based-TNN-Search: • Two sequential NN searches in estimation step • Search range estimation is done in sequential order • Large access time
New TNN algorithms – algo1 • Algo 1: Double-NN-Search • Issue two NN queries in estimation step • p’s NN in S, and p’s NN in R • (s1, r2)
New TNN Algorithms – algo2 • Hybrid-NN-Search • Increases interaction between two channels • Uses result of the finished NN to guide the unfinished NN in order to reduce search range • Uses new distance metrics to perform branch-and-bound • Treat TNN distance as a whole
New TNN Algorithms – algo 2 • NN in Channel 1 finishes first • Already found s=p.NN(S) • Looking for r2, instead of r1
New TNN Algorithms – algo 2 • NN in channel 2 finishes first • Already found r=p.NN(R) • Looking for s2 instead of s1 • Use new criteria when searching the index • Need new distance metrics for branch&bound
New TNN Algorithms – algo 2 • MinTransDist: • Lower bound for trans. dist. from p to an MBR to r. • MinMaxTransDist: • Upper bound for trans. dist. from p to an MBR to r. • Details given in the paper.
New TNN Algorithms - Hybird • Algorithm description: • If the two NN searches in both channels are not finished, follow the Double-NN algorithm • If the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R. • If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.
New TNN Algorithms - Hybrid • Updating and pruning strategy • Use queue to keep potential MBRs, sorted based on their arrival time • Case 2 (s=p.NN(S) finishes first): • Switch NN query point to the s • Initial upper bound update • If there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ ) • Scan the queue of MBRs and use dist. metr. in traditional NN queries.
New TNN Algorithms - Hybrid • Updating and pruning strategy (cont.) • Case 3 (r=p.NN(R) finishes first): • If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper bound • Then scan all the MBRs in the queue, use z=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound. • In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning
New TNN Algorithms - Hybrid • Example for pruning:
Optimization • Goal: reduce energy consumption • Analysis: • Previous algorithms minimize the search range in the Estimate Step by issuing “exact” search • Energy consumption in Filter Step is low • Energy consumption in Estimate Step is high • Approach: • use “approximate” search in Estimate Step to save energy in this step
Optimization • Approximate Search: • Relax the pruning condition • Use ratio of overlapping area to estimate the probability • Compare the ratio with a threshold α
Optimization • How to determine α? • factors: • R-tree height and node depth • Use small α on the root and large α on leaves • Difference in densities of the two datasets involved • Small α or 0on the dataset with smaller density exact search approximate search 0 α 1
Performance Evaluation - settings • Dataset 1: • 39,000 * 39,000 square region • Densities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-4.6, 10-4.2 • # of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969 • Dataset 2: • 39,000 * 39,000 square region • # of points: 2,000 – 30,000 with 2,000 increment
Performance Evaluation - settings • R-tree as air index • Broadcast in depth-first order • STR packing algorithm [3] • (1, m) interleaving [2] • 1,000 query points generated for each of the experiments [3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing. ICDE 1997 [2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997
Performance Evaluation • Algorithms with exact search: • Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window-Based • 1.8≥ size(S) / size(R) ≥ 1 / 40
Performance Evaluation • Algorithms with exact search: • Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time
Performance Evaluation • ANN vs. eNN • Improvement in tune-in time ranges from 11%-20%
Performance Evaluation • Hybrid algorithm with ANN:
Conclusions • Double-NN and Hybrid-NN effectively reduce access time • Cases in which our algorithms reduces tune-in time are stated and discussed • Optimization technique effectively reduces tune-in time of all three algorithms
Future Work • Generalized TNN queries in broadcast environment: • More than 2 datasets are involved • Visiting order not specified • Complete route query • Using new distance metrics in disk based environment
Thank you! • Any questions?
New TNN Algorithms – distance metrics (backup slides) • Def 1: (MinTransDist) • Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)
New TNN Algorithms – distance metrics (backup slides) • Def 2: (MaxDist) • Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓ • MaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r. ℓ r p
New TNN Algorithms – distance metrics (backup slides) • Def 3: (MinMaxTransDist) • Given two points p and r, and an MBR MS, MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MS • Lemma: • Given a starting point p, an ending point r, and an MBR MSenclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)