190 likes | 215 Vues
Nearest Neighbor Queries using R-trees. Based on notes by Yufei Tao. Nearest Neighbor Search. Find the object nearest to a query point q E.g., find the gas station nearest to the red point. k nearest neighbors : Find the k objects nearest to q
E N D
Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao
Nearest Neighbor Search • Find the object nearest to a query point q • E.g., find the gas station nearest to the red point. • k nearest neighbors: Find the k objects nearest to q • E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i} CS4482 CityU of HK
Nearest Neighbor Processing • The R-tree can accelerate NN search, too. • Concept: mindist(q, E) • The minimum distance between a point q and a rectangle E CS4482 CityU of HK
Depth-first NN Algorithm • First load the root and compute the mindist from each entry to the query. • Visit the child of the entry with the smallest mindist. • In this case: E6 CS4482 CityU of HK
Depth-first NN Algorithm (cont.) • Do this recursively at the next level. In the child node of E6, compute the mindist from every entry to the query. • Visit the child node of the entry having the smallest mindist. • In this case, E1 and E2 have the same mindist. • So the decision is random – say, E1 first. • Among all the points in the child node of E1, find the closest point a (our current result). CS4482 CityU of HK
Depth-first NN Algorithm (cont.) • Then backtrack to the child node of E6, where the entry with the next mindist value is E2. • Its mindist 51/2 is however the same as the distance from q to a. • So, we know that no point in E2 can possibly be closer to q than a. • No result in E3 either – same reasoning. CS4482 CityU of HK
Depth-first NN Algorithm (cont.) • We now backtrack to the root, where the entry with the next mindist is E7. • Its mindist 21/2 closer than the distance 51/2 from q to a. • Thus, its subtree may contain some point whose distance to q is smaller than the distance between q and a; so we have to visit it • At the child node of E7, compute the mindist of all entries to q. • E4 will be descended next. CS4482 CityU of HK
Depth-first NN Algorithm (cont.) • In the child node of E4, we find a point h that is closer to q than a. • So h becomes our new nearest neighbor. • We backtrack to the child node of E7, where the entry with the next mindist is E5. • E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So we prune its subtree. • The algorithm backtracks to the root and terminates. • Visited (in this order) root, and the child nodes of E6, E1, E7, E4. CS4482 CityU of HK
Another Depth-first Example: 2 NN • Difference: entries must be pruned based on their distances to our 2nd current NN. • Root => child node of E6 => child node of E1 => find {a, b} here • Backtrack to child node of E6 => child node of E2 (its mindist < dist(q, b)) => update our result to {a, f} • Backtrack to child node of E6 => child node of E3 => backtrack to the root => child node of E7 => child node of E4 => update our result to {a, h} • Backtrack to child node of E7 => prune E5 => backtrack to the root => end. CS4482 CityU of HK
Optimal Performance of kNN Search • What’s the best performance that can ever be achieved for a kNN? • Vicinity circle: Centered at query q, with radius equal to the distance of q to its k-th NN • All nodes that intersect the vicinity circle must be visited. • Child node of E6 must be accessed by any algorithm. • Although there’s no result in its subtree, this cannot be verified unless we visit it! CS4482 CityU of HK
Best-first Algorithm (optimal algorithm) • BF maintains all the (leaf- and non-leaf) entries seen so far in the memory, and sorts them in ascending order by their mindist. • Each step processes the entry in memory with the smallest mindist. CS4482 CityU of HK
Best-first Algorithm (cont.) • Insert all the entries in the child node of E6 into the sorted list. • E7 is the next one to be processed. CS4482 CityU of HK
Best-first Algorithm (cont.) • Insert all the entries in the child node of E7 into the sorted list. • The next entry to be processed is E4. CS4482 CityU of HK
Best-first Algorithm (cont.) • Insert all the entries in the child node of E4 into the sorted list. • The next entry to be processed is h, which is a leaf entry. • This is the first NN of q. CS4482 CityU of HK
Best-first Algorithm: 2NN • Assume we want 2 NNs; then, the algorithm continues. • Report h as the 1st NN, and remove it from the heap • The next entry to be processed is E1 CS4482 CityU of HK
Best-first Algorithm: 2NN (cont.) • Visit the child node of E1; enter all its entries into the sorted list. • The next entry is a, which is a leaf entry • The 2nd NN and the algorithm terminates. • Whenever we process a leaf entry in memory, it is the next NN for sure. CS4482 CityU of HK
Best-first = Best Performance • To find the 1st NN, we visited the root, and the child nodes of E6, E7, E4. • To find the 2nd, in addition to the above 3 nodes, we also visited the child node of E1. • Both cases are optimal. • It can be proved that BF visits the nodes in the tree in ascending order of their mindist to the query point. CS4482 CityU of HK
Retrospect: The Rationale Behind • What is the main reasoning of depth-first and best-first algorithms? • Use mindist to quantify the quality of the best point in a subtree. • If a node’s mindist is already greater than our current result, prune it. CS4482 CityU of HK