1 / 51

Location-Based Services & Continuous kNN Query Processing

Location-Based Services & Continuous kNN Query Processing. Tai Do Data Systems Group, UCF Fall 2005. Outline. Introduction to Data Management in Mobile Computing. Discussion on Location-Based Services and its enabling technologies.

lona
Télécharger la présentation

Location-Based Services & Continuous kNN Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Location-Based Services &Continuous kNN Query Processing Tai Do Data Systems Group, UCF Fall 2005

  2. Outline • Introduction to Data Management in Mobile Computing. • Discussion on Location-Based Services and its enabling technologies. • In-depth discussion on Continuous kNN queries (2 recent papers [MHP05], and [XMA05])

  3. Data Management in Mobile Computing • Our interest: application-driven research that involves data management in mobile computing. • Services/applications that inspire data management research: • Location-based services, Transactional services, Data mining applications. • Research problems to support these novel services efficiently: • Spatiotemporal Query Processing. • Data dissemination over limited bandwidth channels. • Data consistency guarantees. • Advanced interfaces for mobile computers.

  4. Location-Based Services (LBS) • Location-Based Services • can be defined as services that integrate a mobile device’s location or position with other information so as to provide added value to a user. • Examples: • Military and Government industries • Emergency services (E911 in US and 112 in Europe) • Commercial Sector: Advanced Traveler Information Systems (DoT), location-aware games, Advertising services • Commercial potentials of LBS([S03]): • Optimistic prediction: $4B by 2002, $81.9B by 2005 (Europe only) • Pessimistic prediction: $11M by 2002, $167M by 2005 (USA only). • Enabling Technologies: • Mobile Positioning Methods • Location Update Techniques • Location-based Query Processing

  5. Mobile Positioning • GPS: Global Positioning System. Accuracy: up to 3 meters or more. • Cell-ID (Europe): Accuracy: 100m-3km Overview of LBS app. And level of accuracy required ([SV04])

  6. Location Update Techniques • Dead-Reckoning Location Update Policies ([GS05]) • Periodic Updates

  7. Concept of Uncertainty • Uncertainty is an inherent feature in databases storing location information. • Sources of uncertainty: • Mobile Positioning Methods • Location Update Techniques • Capturing uncertainty in the model and query language is an ongoing research.

  8. Location-Based Queries • Two kinds of location-based queries: • Snapshot queries: “Tell me 3 nearest cars around menow” • Continuous queries: “Monitor 3 nearest restaurants around me in thenext 10 minutes” • We focus on continuous kNN (CkNN) query processing. • Main memory solution: Conceptual Partitioning Model CPM {MHP05} • Disk-based solution: Shared Execution Algorithm SEA-CNN {XMA05}

  9. Parameters Values Underlying network Unconstrained (Euclidean) Transportation Network (shortest path) Movement pattern Unpredictable Trajectory Location Update Query-Aware (safe region) Query-Blind (periodic OR dead reckoning) Mutability Moving queries over static objects Static queries over moving objects Moving queries over moving objects Processing Type Distributed Centralized Storage Disk-resident Main memory Common Assumptions

  10. SEA-CNN: Over view • Overview: • Objects are stored in disk, everything else is in memory. • Centralized processing. • Support all kinds of mutability between objects and queries. • No movement pattern, in open space. • Goal: • Minimize I/O cost, and CPU time. • Two important features: • Incremental evaluation of queries • Shared execution

  11. SEA-CNN: Data Structures

  12. SEA-CNN: Incremental Search • Key points: • For each query q, define a search region based on past answer and recent movements of q and objects. • Only objects inside search region are checked against q. • Given q.ARt0 as the answer radius of q at time t0 (q.AR = distance from q to kth-NN object) At time t1, the search radius of query q (q.SRt1) is computed as follows: • Step 1: check if any object moves in q.ARt0 during [t0, t1]. If yes, q.SRt1 = q. ARt0. If no, q.SRt1 = 0. • Step 2: check if any object that was in q.ARt0 but moves out of q.ARt0 during [t0, t1]. If yes, q.SRt1 equals the distance from q to the furthest object. • Step 3: check if q moves during [t0, t1]. If yes: • If q. SRt1 =0 then q.SRt1 = q.ARt0 + |q.Loct1- q.Loct0| • If q. SRt1 !=0 then q.SRt1 = q. SRt1 + |q.Loct1- q.Loct0|

  13. SEA-CNN: Incremental Search(An Example) Q1: O5 and Q1 move during [T0, T1]. So Q1.SRT1 = Q1.ART0 + |Q1.LocT1-Q1.LocT0 Q2: O8 moves out of Q2.ART0 during [T0, T1]. So Q2.SRT1 = |Q2.LocT1-O8.LocT0

  14. SEA-CNN: Shared Execution • Key points: • Utilize shared execution to reduce repeated I/O operations. • Group similar queries together. Evaluating this set of queries is reduced to a spatial join between the objects and the queries.

  15. SEA-CNN: Algorithm

  16. CPM: Overview • Overview: • Objects and queries are stored in memory. • Centralized processing. • Support all kinds of mutability between objects and queries. • No movement pattern, in open space. • Goal: • Minimize CPU time. • Important features: • Conceptual Partitioning • Simulate traditional kNN search (using branch-and-bound search with breadth-first (or best-first) traversal) • Roadmap: • Initial NN Computation (conceptual partitioning + branch and bound search + breadth-first traversal) • Handling Updates

  17. CPM: Data Structures

  18. CPM: NN Computation(Conceptual Partitioning) • Conceptual Partitioning: • What is CP? Partitioning of cells into rectangles based on proximity to the query cell. Each rectangle has direction and level. • Why CP? A natural processing order of the cells. Facilitate NN search (search minimal set of cells).

  19. CPM: NN Computation(Algorithm by Example) Search heap content (always sorted): • H ={<c4,4,0>, <U0,0.1>, <L0,0.2>, <R0,0.8>, <D0,0.9>} • Deheap c4: do nothing. • Deheap U0: • insert cells of U0 • Insert U1 • Continue until deheap <c3,3, 1> and find 1st candidate p1: • best_dist = dist(p1, q) = 1.7 • Continue until deheap c2,4 and find p2: • best_dist = dist(p2, q) = 1.3 • Terminate because the next entry in the heap has min_dist >= best_dist

  20. CPM: Handling Updates • Key Points: • Focus on moving objects, static queries. Moving queries are treated as new queries. • Reexamine only queries whose influence regions overlap with updated cells. • Re-compute affected queries incrementally based on book keeping information to save computation time.

  21. CPM: Handling Updates(Algorithm by Example) NN Re-computation Algorithm Input: grid G, affected query q Output: new NN for q /* Similar to NN Computation. Utilize the book keeping information in visit_list and search heap */ • p2 moves from c2,4 to c0,6 • c2,4 has q in the influence list and dist(q, p2’) > best_NN = dist(q, p2)  mark q as affected query. • c0,6 has an empty influence list  ignore • Re-compute NN for q in the NN Re-computation algorithm

  22. SEA-CNN & CPM: A Comparison • Common features between the two: • Performance metrics: • Use query processing time (or CPU time) at the centralized server as the primary metric. • Ignore communication cost. • Employ Grid-based Indexing (simple, fast maintenance). • Keep a search region for each query to handle updates. • Are the differences significant? • CPM saves some computations over SEA-CNN (as shown in the CPM paper) because CPM uses an optimal search algorithm. • However, is saving in CPU time still very important?

  23. Summary • Monitoring queries to support LBS is an intensive research area in the past few years: • Short-term research trend seems to be proposals of new, more advance query types (our next presentation will discuss Reverse NN, and Group NN). • Long-term research could be a Moving Object Databases. Recommend: “Moving Objects Databases” textbook to gain perspective: • Location-management perspective vs. spatio-temporal data perspective. • Many LBS-based commercial products: Verilocation, uLocate, meetro, EarthComber, CellSpotting. • Standards and Development Software: Natural Area Coding System, Mobile Location Services Reference Architecture by Sun. • For LBS updated info: try LBSZone.

  24. References • {B99} D. Barbara. "`Mobile Computing and Databases- A Survey.“ In {\em IEEE Transactions of Knowledge and Data Engineering, 11(1), 108-117, 1999.} • {S03} http://www.wirelessdevnet.com/features/nacjan03/ • {GS05} R. H. Guting, M. Schneider. Moving Object Databases. Book. • {SV04} J. Schiller, A. Voisard. Location-based Services. Book. • {MHP05} Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias. Conceptual Partitioning: An Efficient Method for Continuous Nearest Neighbor Monitoring Nearest Neighbor Monitoring. SIGMOD 2005. • {YPK05} Yu, X., Pu, K., Koudas, N. Monitoring K-Nearest Neighbor Queries Over Moving Objects. ICDE, 2005. • {XMA05} Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005. • {CDT+00} Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In SIGMOD, 2000. • {CF02} Sirish Chandrasekaran and Michael J. Franklin. Streaming Queries over Streaming Data. In VLDB, 2002. (Psoup system).

  25. Note • Due date of your presentation slides is November 14 2005.

  26. Aggregate NN Queries in Spatial Databases and Location-based Services Tai Do Data Systems Group, UCF November 11, 2005

  27. Outline • Aggregate Nearest Neighbor (ANN) queries: • Introduction to ANN. • Solutions for Group Nearest Neighbor (GNN) Queries, a specific type of ANN. • Solutions for Continuous Group Nearest Neighbor Queries (CGNN).

  28. Aggregate NN:Examples and Applications • Applications: • Business decision making (construction of new facilities) • Military Rescue (earliest pick-up time) • Severe weather monitoring (most dangerous area)

  29. Aggregate NN: Definition • What is ANN? • A generalized form of NN search (multiple query points vs. single query point) • Formally: • Given P = {p1, …, pN} (set of data points), Q={q1,…qn} (set of query points) • Aggregate distance function adist(p, Q) = f(|pq1|, …, |pqn|) • An ANN query returns the data point p with the minimum aggregate distance Note: AkNN is similar (find k >=1 data points), we only focus on ANN. • When f= sum, the ANN is called Group Nearest Neighbor Queries.

  30. Group NN Queries • Assumptions: • Queries are in memory. • Data points are in disk and indexed by R-tree. • Goal: • Minimize the extent and cost of the search (I/O and CPU time) • Roadmap: 3 solutions • Multiple query method • Single point method • Minimum bound method

  31. Multiple Query Method (MQM) • Apply multiple conventional NN queries, then combine the results. • MQM is a straightforward application of the threshold algorithm ([FLN03]): • Each query point visits incrementally its NN data points (1st NN, then 2nd NN, …) • Compute the aggregate distance of the current NN data point • Do the two above steps until we have seen the best data point. • Main idea: • Question: how do we know that the aggregate distance of the seen data point is smaller than the aggregate distance of unseen data points? • Answer: Predict minimum aggregate distance of unseen data points (or in other words, use a threshold)

  32. MQM: An Example (1) • Q = {q1, q2} • P = {p1,…, p12}

  33. ID Dist(q1) Dist(q2) Sum/adist MQM: An Example (2) q2 q1 t1 = 0 t2 = 0 T= 0, best_dist = , best_NN = null

  34. ID dist(q1) dist(q2) Sum/adist MQM: An Example (3) • Step 1: • Find the next (1st ) NN of q1 • Update t1 and T q2 q1 (p10, 2) t1 = 2 T= t1 + t2 = 2 + 0 = 2

  35. ID dist(q1) dist(q2) Sum/adist MQM: An Example (4) • Step 2: • if the current aggregate distance < best_dist ? Update best_dist and best_NN • If current best aggregate distance <= T ? Stop • Else go to the next NN of the next query point and repeat step 1 q2 q1 (p10, 2) t1 = 2 p10 2 5 7 T = 2 best_dist =  best_dist = 7 best_NN = p10

  36. ID dist(q1) dist(q2) Sum/adist MQM: An Example (5) • Step 1: • Find the next (1st ) NN of q2 • Update t2 and T q2 q1 (p10, 2) (p11, 3) t1 = 2 t2 = 3 7 p10 2 5 T = t1 + t2 = 2 + 3 = 5 best_dist = 7 best_NN = p10

  37. ID dist(q1) dist(q2) Sum/adist MQM: An Example (6) • Step 2: • if the current aggregate distance < best_dist ? Update best_dist and best_NN • If current best aggregate distance <= T ? Stop • Else go to the next NN of the next query point and repeat step 1 q2 q1 (p10, 2) (p11, 3) t1 = 2 t2 = 3 p10 2 5 7 best_dist = 7 p11 3 3 6 T = 5 best_dist = 6 best_NN = p11

  38. ID dist(q1) dist(q2) Sum/adist MQM: An Example (7) • Step 1: • Find the next (2nd ) NN of q1 • Update t1 and T q2 q1 (p10, 2) (p11, 3) (p11, 3) t2 = 3 7 p10 2 5 t1 = 3 p11 3 3 6 T = t1 + t2 = 3 + 3 = 6 best_dist = 6 best_NN = p11

  39. ID dist(q1) dist(q2) Sum/adist MQM: An Example (6) • Step 2: • if the current aggregate distance < best_dist ? Update best_dist and best_NN • If current best aggregate distance <= T ? Stop • Else go to the next NN of the next query point and repeat step 1 q2 q1 (p10, 2) (p11, 3) (p11, 3) t1 = 3 p10 2 5 7 best_dist = 6 t1 = 3 p11 3 3 6 T = 6 No Update p11 3 3 6 STOP best_dist = 6 best_NN = p11

  40. Single Point Method (SPM) • Problem with MQM: • Multiple accesses to the same node and retrieve the same data point (e.g p11) through different queries. • SPM processes queries by a single traversal. • Strategy: • Compute the centroid q of Q, which is a point with small adist(q, Q) • The GNN is a point of P “near” q. • Challenges: • The computation of q. • The range around q, in which we should look for points of P, before we conclude that no better GNN can be found.

  41. SPM: Illustration

  42. SPM: The Computation of q

  43. SPM: Finding the range • To define the range around q: find heuristics that can safely prune nodes in R-tree • Lemma 1: • For each query point qi we have |pqi| + |qiq|>= |pq| • Summing up the n inequalities: |pqi| + |qiq| >= n*|pq|  adist (p, Q) >= n|pq| - adist (q, Q) (1) • Lemma 1 can be used for pruning intermediate nodes: • Node N can be pruned if mindist(N, q) >= (1/n) * [best_dist + adist(q,Q)] (2) Because: when we transform this pruning rule we have n * mindist(N, q) – adist(q,Q) >= best_dist (3) For any p in node N: dist(p,q) >= mindist(N,q), so n * dist(p,q) – adist(q, Q) >= best_dist (4) Using Lemma 1 we have adist(p, Q) >= best_dist, hence node N can be safely pruned.

  44. SPM: Pruning Illustration • Both N1 and N2 can be pruned: • best_dist = adist(best_NN, Q) = 9 • adist(q, Q) = 3 • (1/n)(best_dist + adist(q,Q)) = ½ (9 + 3) = 6 • mindist(N1,q) = 10 and mindist(N2,q) = 6

  45. Minimum Bound Method (MBM) • Like SPM, MBM performs a single query, but uses the minimum bounding rectangle M of Q (instead of a centroid q) to prune the search space. • Is MBM obviously better than SPM? No clear reason. Must evaluate through experiments. • Strategy: • Use good heuristics to identify the qualifying nodes

  46. Minimum Bound Method: Heuristics • Heuristic 1: A node N can’t contain qualifying points if: mindist (N, M) >= (1/n)*best_dist, because for any data point p in N adist(p, Q) >= n * mindist(N, M) >= best_dist • Heuristic 1 prunes N1 but not N2. • Heuristic 2: A node N can be safely pruned if: (mindist(N, qi)) >= best_dist • Heuristic 2 prunes both N1 and N2

  47. Performance Study

  48. Continuous Group NN • Assumptions: • Both query points and data points are in memory. • Method: • Use a grid index. • Utilize conceptual partitioning of the space around query Q. • Apply Minimum Bound Method.

  49. Continuous GNN:Details • amindist (c, Q) = (qi in Q) (mindist(c, qi)). • amindist(c,Q) is the lower bound of mindist(p, Q) for any data point p in cell c. • The GNN computation is similar to the NN computation presented in previous class.

  50. Summary • Threshold Algorithm: • Simple, useful, and reusable. • Aggregate Nearest Neighbor Queries in Spatial Database: • Practical applications. • Good heuristics are important. • Optimal ANN search remains unsolved???

More Related