270 likes | 299 Vues
Reverse Spatial and Textual k Nearest Neighbor Search. Athens, Greece, 2011. Outline. Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion. 1. Motivation. clothes. food. clothes. clothes. sports. food. clothes.
E N D
Reverse Spatial and Textual k Nearest Neighbor Search Athens, Greece, 2011
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 1
Motivation clothes food clothes clothes sports food clothes • If add a new shop at Q, which shops will be influenced? • Influence facts • Spatial Distance • Results: D, F • Textual Similarity • Services/Products... • Results: F, C 2
Problems of finding Influential Sets Traditional query Reverse k nearest neighbor query (RkNN) Our new query Reverse Spatial and Textual kNearest Neighbor query (RSTkNN) 3
Problem Statement • Spatial-Textual Similarity • describe the similarity between such objects based on both spatial proximity and textual similarity. • Spatial-Textual Similarity Function 4
Problem Statement (con’t) • RSTkNN query • is finding objects which have the query object as one of their k spatial-textual similar objects. 5
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 6
Related Work • Pre-computing the kNN for each object • (Korn ect, SIGMOD2000, Yang ect, ICDE2001) • (Hyper) Voronio cell/planes pruning strategy • (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) • 60-degree-pruning method • (Stanoi ect, SIGMOD2000) • Branch and Bound (based on Lp-norm metric space) • (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) Challenging Features: • Lose Euclidean geometric properties. • High dimension in text space. • k and α are different from query to query. 7
Baseline method For each object o in the database Precompute Threshold Algorithm Object o q is no more similar than o’ Spatial NNs Textual NNs Spatial-textual kNN o’ Give query q, k & α q is more similar than o’ Inefficient since lacking a novel data structure 8
Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 9
Main Idea of Search Strategy ●Prune an entry E in IUR-Tree, when —query q is q3 —query q is no more similar than kNNL(E) —there are at least k objects more similar than query q. ●Report E to be results, when —query q is q1 —query q is more similar than kNNU(E) —there are at most k-1 objects more similar than query q. ●Otherwise (q is q2), expand E and determine its child entries whether to be results or not. 11
How to Compute the Bounds? Similarity approximations MinST(E, E’): TightMinST(E, E’): MaxST(E, E’): 12
Example for Computing Bounds Current traveled entries: N1, N2, N3 Given k=2, to compute kNNL(N1) andkNNU(N1). effect N1 N3 N1 N2 Compute kNNU(N1) Compute kNNL(N1) TightMinST(N1, N3) = 0.564 MinST(N1, N3) = 0.370 TightMinST(N1, N2) = 0.179 MinST(N1, N2) = 0.095 MaxST(N1, N3) = 0.432 MaxST(N1, N2) = 0.150 decrease decrease kNNU(N1) = 0.432 kNNL(N1) = 0.370 13
Overview of Search Algorithm • RSTkNN Algorithm: • Travel from the IUR-tree root • Progressively update lower and upper bounds • Apply search strategy: • prune unrelated entries to Pruned; • report entries to be results Ans; • add candidate objects to Cnd. • FinalVerification • For objects in Cnd, check whether to results or not by updating the bounds for candidates using expanding entries in Pruned. 14
Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N1 N2 N3 p5 p3 p1 p2 p4 Initialize N4.CLs; EnQueue(U, N4); Priority = MaxST(N4, q) U N4, (0, 0) 15
Mutual-effect N2 N1 N3 N1 N3 N2 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N1 N2 N3 p5 p3 p1 p2 p4 DeQueue(U, N4) EnQueue(U, N2) EnQueue(U, N3) Pruned.add(N1) Pruned N1(0.37, 0.432) U N4(0, 0) N3(0.323, 0.619 ) N2(0.21, 0.619 ) 16
Mutual-effect p4 N2 p4,N2 p5 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N2 N3 N1 p5 p3 p1 p2 p4 DeQueue(U, N3) Answer.add(p4) Candidate.add(p5) Pruned Answer N1(0.37, 0.432) p4(0.21, 0.619 ) U Candidate N3(0.323, 0.619 ) N2(0.21, 0.619 ) p5(0.374, 0.374) 17
effect p2,p3 p4 p2,p3 p5 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N2 N3 N1 p5 p3 p1 p2 p4 DeQueue(U, N2) Answer.add(p2, p3) Pruned.add(p5) So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. Pruned Answer N1(0.37, 0.432) p4 p2 p3 U Candidate N2(0.21, 0.619 ) p5(0.374, 0.374) 18
Cluster IUR-tree: CIUR-tree IUR-tree: Texts in an index node could be very different. CIUR-tree: An enhanced IUR-tree by incorporating textual clusters. 19
Optimizations • Motivation • To give tighter bounds for entries to improve the search performance • To purify the textual description in the index node • Outlier Detection and Extraction (ODE-CIUR) • Extract subtrees with outlier clusters • Take the outliers into special account and calculate their bounds separately. • Text-entropy based optimization (TE-CIUR) • Define TextEntropy to depict the distribution of text clusters in an entry of CIUR-tree • Travel first for the entries with higher TextEntropy,i.e. more diverse in text. 20
Experimental Study • Experimental Setup • OS: Windows XP; CPU: 2.0GHz; Memory: 4GB • Page size: 4KB; Language: C/C++. • Compared Methods • baseline, IUR-tree, ODE-CIUR, TE-CIUR, and ODE-TE. • Datasets • ShopBranches(Shop), extended from a small real data • GeographicNames(GN), real data • CaliforniaDBpedia(CD), generated combining location in California and documents from DBpedia. • Metric • Total query time • Page access number 21
Scalability (a) Log-scale version (b) Linear-scale version 22
Effect of k (a) Query time (b) Page access 23
Conclusion Propose a new query problem RSTkNN. Present a hybrid index IUR-Tree. Present the efficient search algorithm to answer the queries. Give the enhancing variant CIUR-Tree and two optimizations ODE-CIUR and TE-CIUR to further improve search processing. Extensive experiments confirm the efficiency and scalability of our algorithms. 24
Reverse Spatial and Textual k Nearest Neighbor Search Thanks! Q & A
A straightforward method • Compute RSkNN and RTkNN, respectively; • Combine both results of RSkNN and RTkNN get RSTkNN results. No sensible way for combination. (Infeasible)