1 / 86

Location-aware Query Processing and Optimization: A Tutorial

Location-aware Query Processing and Optimization: A Tutorial. Mohamed F. Mokbel Walid G. Aref Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA mokbel@cs.umn.edu Department of Computer Science, Purdue University West Lafayette, Indiana, USA.

odina
Télécharger la présentation

Location-aware Query Processing and Optimization: A Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Location-aware Query Processing and Optimization: A Tutorial Mohamed F. Mokbel Walid G. Aref Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA mokbel@cs.umn.edu Department of Computer Science, Purdue University West Lafayette, Indiana, USA. aref@cs.purdue.edu

  2. Motivation

  3. ApplicationsTraffic Monitoring • How many cars are in the downtown area? • Send an alert if a non-friendly vehicle enters a restricted region • Report any congestion in the road network • Once an accident is discovered, immediately send alarm to the nearest police and ambulance cars • Make sure that there are no two aircrafts with nearby paths

  4. Applications (Cont.)Location-based Store Finder / Advertisement • Where is my nearest Gas station? • What are the fast food restaurants within 3 miles from my location? • Let me know if I am near to a restaurant while any of my friends are there • Send E-coupons to all customers within 3 miles of my stores • Get me the list of all customers that I am considered their nearest restaurant

  5. Built-in Approach GIS Interface Spatio-temporal GIS DBMS ST Query Processing DBMS ST-Index Location-based Database Servers Layered Approach

  6. Continuously report the number of cars in the freeway • Type: Range query • Time: Present • Duration: Continuous What are my nearest McDonalds for the next hour? • Type: Nearest-Neighbor query • Time: Future • Duration: Continuous • Query: Moving • Object: Stationary Send E-coupons to all cars that I am their nearest gas station • Type: Reverse NN query • Time: Present • Duration: Snapshot • Query: Stationary • Object: Moving What was the closest dist. between Taxi A & me yesterday? • Type: Closest-point query • Time: Past • Duration: Snapshot • Query: Moving • Object: Moving Variety of Location-aware Queries • Query: Stationary • Object: Moving

  7. Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Snapshot Past Queries • Snapshot Present Queries • Snapshot Future Queries • Spatio-temporal Access Methods • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimization • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues

  8. Location-aware Snapshot Query ProcessingQuerying the Past • Examples: • Querying Along the Temporal Dimension: What was the location of a certain object from 7:00 AM to 10:00 AM yesterday? • Querying Along the Spatial Dimension: Find all objects that were in a certain area at 7:00 AM yesterday • Querying Along the Spatio-temporal Dimension: Find all objects that were close to each other from 7:00 AM to 8:00 AM yesterday • Features: • Large number of historical trajectories • Persistent read-only data • The ability to query the spatial and/or temporal dimensions

  9. Location-aware Snapshot Query ProcessingIndexing the Time Dimension • Historical trajectories are represented by their three-dimensional Minimum Bounding Rectangle (MBR) Time • 3D-R-tree is used to index the MBRs • Technique simple and easy to implement • Does not scale well • Does not provide efficient query support

  10. Timestamp 0 3D-R-tree Location-aware Snapshot Query ProcessingMulti-version Index Structures • Maintain an R-tree for each time instance • R-tree nodes that are not changed across consecutive time instances are linked together Timestamp 1 • A multi-version R-tree can be combined with a 3D-R-tree to support interval queries

  11. Location-aware Snapshot Query ProcessingQuerying the Present • Time is always NOW • Example Queries: • Find the number of objects in a certain area • What is the current location of a certain object? • Features: • Continuously changing data • Real-time query support is required • Index structures should be update-tolerant • Present data is always accessed through continuous queries

  12. Location-aware Snapshot Query ProcessingUpdating Index Structures • Traditional R-tree updates are top-down • Updates translated to delete and insert transactions • To support frequent updates: • Updates can be managed in space without the need for deletion or insertions • Bottom-up approaches through auxiliary index structures to locate the object identifier Hash based on OID

  13. Update Memo Location-aware Snapshot Query ProcessingUpdate Memos • Keep a memo with the R-tree • The memo contains the recent updates to the existing R-tree • The query answer returned from the R-tree should be passed through the memo • The update memo is reflected to the R-tree once the relevant disk page is retrieved Spatio-temporal Queries Raw answer set Final answer set

  14. Location-aware Snapshot Query ProcessingQuerying the Future • Examples: • What will my nearest restaurant be after 30 minutes? • Does my path conflict with any other cars for the next hour? • Features: • Predict the movement through a velocity vector • Prediction could be valid for only a limited time horizon in the future

  15. Location-aware Snapshot Query ProcessingDuality Transformation • A line (trajectory) in the two-dimensional space can be transformed into a point in another dual two-dimensional space • Trajectory: x(t) = vt + a  Point: (v,a) • All queries will need to be transformed into the dual space • Rectangular queries will be represented as polygons

  16. Location-aware Snapshot Query ProcessingTime-Parameterized Data Structures • A bounding rectangle with MBR & VBR is guaranteed to contain all its moving objects as long as they maintain their velocity vector • High degree of overlap when the velocity vector is not updated • The Time-parameterized R-tree (TPR-tree) consists of: • Minimum bounding rectangles (MBR) • Velocity bounding rectangles (VBR)

  17. Location-aware Snapshot Query ProcessingIndexing Past, Present, and Future • A unified index structure for both past, present, and future data • Makes use of the partial-persistent R-tree for past data and the TPR-tree for current and future data • Double Time-Parameterized Bounding rectangles are used to bound moving objects. Double TPBR has two components: • Tail MBR that starts at the time of the last update and extends to infinity. The tail is a regular TPBR of the TPR-tree • Head MBR to bound the finite historical trajectories. The head is an optimized TPBR • Querying is similar to regular PPR-tree search with the exception of redefining the intersection function to accommodate for the double TPBR

  18. Spatio-temporal Access Methods RPPF-tree Red: Future Blue: Past Green: Present Brown: All

  19. Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Continuous Queries Vs. Snapshot Queries • Approaches for Continuous Query Evaluation • Scalable Execution of Continuous Queries • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues

  20. Answer Query Query • Continuous Queries Answer Query Data Data Snapshot vs. Continuous Query Processing • Traditional (Snapshot) Queries Data

  21. Location-aware Continuous Query ProcessingApproaches • Straightforward Approach • Abstract the continuous queries to a series of snapshot queries evaluated periodically • Result Validation • Result Caching • Result Prediction • Incremental Evaluation

  22. Location-aware Continuous Query ProcessingResult Validation • Associate a validation condition with each query answer • Valid time (t): • The query answer is valid for the next t time units • Valid region (R) • The query answer is valid as long as you are within a region R • It is challenging to maintain the computation of valid time/region for querying moving objects • Once the associated validation condition expires, the query will be reevaluated

  23. Location-aware Continuous Query ProcessingCaching the Result • Observation: Consecutive evaluations of a continuous query yield very similar results • Idea: Upon evaluation of a continuous query, retrieve more data that can be used later • K-NN query • Initially, retrieve more than k • Range query • Evaluate the query with a larger range • How much we need to pre-compute? • How do we do re-caching?

  24. Location-aware Continuous Query ProcessingPredicting the Result • Given a future trajectory movement, the query answer can be pre-computed in advance • The trajectory movement is divided into N intervals, each with its own query answers Ai Nearest-Neighbor Query • The query is evaluated once (as a snapshot query). Yet, the answer is valid for longer time periods • Once the trajectory changes, the query will be reevaluated

  25. + _ + Location-aware Continuous Query ProcessingIncremental Evaluation • The query is evaluated only once. Then, only the updates of the query answer are evaluated • There are two types of updates. Positiveand Negativeupdates Query Result • Only the objects that cross the query boundary are taken into account • Need to continuously listen for notifications that someone cross the query boundary

  26. Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Centralized Database Systems • Location-aware Distributed Database Systems • Location-aware Data Stream Management Systems • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues

  27. Location-aware DatabaseServer Continuous Continuous K-NN Query Range Query Keep me updated by nearest 3 hospitals How many cars in the highlighted area? Make sure that the nearest 3 airplanes are FRIENDLY Alert me if there are less than 3 police cars within 5 miles Monitor the traffic in the red areas Continuous Continuous Continuous K-NN Query Range Query Range Query Scalability of Location-aware Continuous QueriesMotivation

  28. Scalability of Location-aware Continuous QueriesMain Concepts • Continuous queries last for long times at the server side • While a query is active in the server, other queries will be submitted • Shared execution among multiple queries • Should we index data OR queries? • Data and queries may be stationary or moving • Data and queries are of large size • Data and queries arrive to the system with very high rates • Treat data and queries similarly • Queries are coming to data OR data are coming to queries? • Both data and queries are subjected to each other • Join data with queries

  29. Each query is a single thread One thread for all continuous queries . . . QN Q2 Q1 . . . . . . . . . . . . ST Query 1 ST Query N ST Query 2 D- D- D- Split Q- D- Index Index Index Index Index Shared ST Join Data Objects Data Objects Data Objects Data Objects ST Queries Scalability of Location-aware Continuous QueriesMain Concepts (Cont.) • Evaluating a large number of concurrent continuous spatio-temporal queries is abstracted as a spatio-temporal join between moving objects and moving queries

  30. Moving Objects (Stationary) ST Queries in an R-tree index structure Scalability of Location-aware Continuous QueriesLocation-aware Centralized Database Systems • Centralized index structures • Index the queries instead of data • Valid only for stationary queries

  31. Scalability of Location-aware Continuous QueriesLocation-aware Centralized Database Systems (Cont.) • To accommodate for the continuous movement of both data and queries: • Concurrent continuous queries share a grid structure • Moving objects are hashed to the same grid structure as queries • The spatio-temporal join is done by overlaying the two grid structures

  32. Scalability of Location-aware Continuous QueriesLocation-aware Distributed Database Systems • Motivation: Centralized location-aware servers will have a bottleneck at the server side • Assumption: Moving objects have devices with the capability of doing some computations • Idea: • Server will ship some of its processing to the moving objects • Server will act as a mediator among moving objects • Implementation: Moving objects should welcome cooperation in such environments

  33. Scalability of Location-aware Continuous QueriesLocation-aware Distributed Database Systems (Cont.) • Each moving object O maintains a list of the queries that O may be part of their answer • It is the responsibility of the moving object O to report that O becomes part of the answer of a certain query • Once a query updates its location, it sends the new location to the server, which will propagate the new location to the interested users • The server is responsible in determining which objects will be interested in which queries

  34. Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems • Motivation: Very high arrival rates that are beyond the system capability to store • Idea: Only store those objects that are likely to produce query results, i.e., only significant objects are stored, all other data are simply dropped • Significant objects: A moving object O is significant if there is at least one query that is interested in O’s location • Challenge: Discovering that an object becomes insignificant

  35. Cache Area Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • Only significant objects are stored in-memory • An object is considered significant if it is either in the query area or the cache area • Due to the query and object movements, a stored object may become insignificant at any time • Larger cache area indicates more storage overhead and more accurate answer

  36. Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • The first kobjects are considered an initial answer • K-NN query is reduced to a circular range query However, the query area may shrink or grow K = 3

  37. Q2 QN Q1 Q1 Q2 QN . . . . . . . . . . . . . . . +/- +/- +/- +/- +/- +/- . . . Stationary Range Moving kNN Moving Range Split Shared Operator Shared Spatio-temporal Join Shared Memory Buffer among all C. Queries Stream of Moving Objects Stream of Moving Objects Stream of Moving Queries Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) Each query is a single thread One thread for all continuous queries

  38. 2 1 6 5 3 4 7 K = 2 Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • Query Load Shedding • Reduce the cache area • Possibly reduce the query area • Immediately drop insignificant tuples • Intuitive and simple to implement • Object Load Shedding • Objects that satisfy less than k queries are insignificant • Lazily drop insignificant tuples • Challenge I: How to choose k? • Challenge II: How to provide a lower bound for the query accuracy?

  39. Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimization • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues

  40. Location-aware Query Optimization • Spatio-temporal pipelinable query operators • Range queries • Nearest-neighbor queries • Selectivity estimation for spatio-temporal queries/operators • Spatio-temporal histograms • Sampling • Adaptive query optimization for continuous queries

  41. Database Engine Spatio-temporal Operators Only produce objects in the areas of interest The performance of scalar functions is limited Database Engine Spatio-temporal Query Operators • Existing Approaches are Built on Top of DBMS (at the Application Level) Continuously report the trucks in this area Scalar functions (Stored procedure) SELECT O. ID FROM Objects O WHERE O.type = truck INSIDEArea A

  42. Scalar Function INSIDE JOIN JOIN Spatio-temporal Query Operators • “Continuously report the Avis cars in a certain area” SELECT M.ObjectID FROM MovingObjects M, AvisCars A WHERE M.ID = A.ID INSIDE RegionR Spatio-temporal Operators Scalar Function +/- +/- INSIDE +/- AvisCars Moving Objects AvisCars Moving Objects

  43. SELECT SELECT INSIDE INSIDE Spatio-temporal Selectivity Estimation • Estimating the selectivity of spatio-temporal operators is crucial in determining the best plan for spatio-temporal queries SELECT ObjectID FROM MovingObjects M WHERE Type = Truck INSIDE Region R

  44. x t Spatio-temporal Histograms • Moving objects in D-dimensional space are mapped to 2D-dimensional histogram buckets x t

  45. 6.98% 6.98% 6.01% 6.01% Spatio-temporal Histogram 6.98% 6.98% 6.01% 6.01% Query Executer 6.01% 6.01% 6.01% 6.01% 6.25% 6.25% 6.25% 6.25% 6.01% 6.01% 6.01% 6.01% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% Spatio-temporal Histograms with Query Feedback • Estimating the selectivity of spatio-temporal operators is crucial in determining the best plan for spatio-temporal queries 10% Q1 Query Query Optimizer Query plan Feedback

  46. SELECT ObjectID FROM MovingObjects M WHERE Type = Truck INSIDE Region R SELECT SELECT INSIDE INSIDE Moving Objects Moving Objects Adaptive Query Optimization • Continuous queries last for long time (hours, days, weeks) • Environment variables are likely to change • The initial decision for building a query plan may not be valid after a while • Need continuous optimization and ability to change the query plan: • Training period: Spatio-temporal histogram, periodicity mining • Online detection of changes

  47. Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues

  48. Uncertainty in Moving Objects • Location information from moving objects is inherently inaccurate • Sources of uncertainty: • Sampling. A moving object sends its location information once every t time units. Within any two consecutive locations, we have no clue about the object’s exact location • Reading accuracy. Location-aware devices do not provide the exact location • Object movement and network delay. By the time that a certain reading is received by the server, the moving object has already changed its location

  49. Uncertainty in Moving Objects • Historical data (Trajectories) • Current data T1 T0+Є2 T0+Є0 T0 T0+Є1

  50. Uncertainty in Moving ObjectsError in Query Answer • Range Queries • Nearest Neighbor Queries

More Related