870 likes | 1.03k Vues
Location-aware Query Processing and Optimization: A Tutorial. Mohamed F. Mokbel Walid G. Aref Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA mokbel@cs.umn.edu Department of Computer Science, Purdue University West Lafayette, Indiana, USA.
E N D
Location-aware Query Processing and Optimization: A Tutorial Mohamed F. Mokbel Walid G. Aref Department of Computer Science and Engineering, University of Minnesota Minneapolis, MN, USA mokbel@cs.umn.edu Department of Computer Science, Purdue University West Lafayette, Indiana, USA. aref@cs.purdue.edu
ApplicationsTraffic Monitoring • How many cars are in the downtown area? • Send an alert if a non-friendly vehicle enters a restricted region • Report any congestion in the road network • Once an accident is discovered, immediately send alarm to the nearest police and ambulance cars • Make sure that there are no two aircrafts with nearby paths
Applications (Cont.)Location-based Store Finder / Advertisement • Where is my nearest Gas station? • What are the fast food restaurants within 3 miles from my location? • Let me know if I am near to a restaurant while any of my friends are there • Send E-coupons to all customers within 3 miles of my stores • Get me the list of all customers that I am considered their nearest restaurant
Built-in Approach GIS Interface Spatio-temporal GIS DBMS ST Query Processing DBMS ST-Index Location-based Database Servers Layered Approach
Continuously report the number of cars in the freeway • Type: Range query • Time: Present • Duration: Continuous What are my nearest McDonalds for the next hour? • Type: Nearest-Neighbor query • Time: Future • Duration: Continuous • Query: Moving • Object: Stationary Send E-coupons to all cars that I am their nearest gas station • Type: Reverse NN query • Time: Present • Duration: Snapshot • Query: Stationary • Object: Moving What was the closest dist. between Taxi A & me yesterday? • Type: Closest-point query • Time: Past • Duration: Snapshot • Query: Moving • Object: Moving Variety of Location-aware Queries • Query: Stationary • Object: Moving
Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Snapshot Past Queries • Snapshot Present Queries • Snapshot Future Queries • Spatio-temporal Access Methods • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimization • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues
Location-aware Snapshot Query ProcessingQuerying the Past • Examples: • Querying Along the Temporal Dimension: What was the location of a certain object from 7:00 AM to 10:00 AM yesterday? • Querying Along the Spatial Dimension: Find all objects that were in a certain area at 7:00 AM yesterday • Querying Along the Spatio-temporal Dimension: Find all objects that were close to each other from 7:00 AM to 8:00 AM yesterday • Features: • Large number of historical trajectories • Persistent read-only data • The ability to query the spatial and/or temporal dimensions
Location-aware Snapshot Query ProcessingIndexing the Time Dimension • Historical trajectories are represented by their three-dimensional Minimum Bounding Rectangle (MBR) Time • 3D-R-tree is used to index the MBRs • Technique simple and easy to implement • Does not scale well • Does not provide efficient query support
Timestamp 0 3D-R-tree Location-aware Snapshot Query ProcessingMulti-version Index Structures • Maintain an R-tree for each time instance • R-tree nodes that are not changed across consecutive time instances are linked together Timestamp 1 • A multi-version R-tree can be combined with a 3D-R-tree to support interval queries
Location-aware Snapshot Query ProcessingQuerying the Present • Time is always NOW • Example Queries: • Find the number of objects in a certain area • What is the current location of a certain object? • Features: • Continuously changing data • Real-time query support is required • Index structures should be update-tolerant • Present data is always accessed through continuous queries
Location-aware Snapshot Query ProcessingUpdating Index Structures • Traditional R-tree updates are top-down • Updates translated to delete and insert transactions • To support frequent updates: • Updates can be managed in space without the need for deletion or insertions • Bottom-up approaches through auxiliary index structures to locate the object identifier Hash based on OID
Update Memo Location-aware Snapshot Query ProcessingUpdate Memos • Keep a memo with the R-tree • The memo contains the recent updates to the existing R-tree • The query answer returned from the R-tree should be passed through the memo • The update memo is reflected to the R-tree once the relevant disk page is retrieved Spatio-temporal Queries Raw answer set Final answer set
Location-aware Snapshot Query ProcessingQuerying the Future • Examples: • What will my nearest restaurant be after 30 minutes? • Does my path conflict with any other cars for the next hour? • Features: • Predict the movement through a velocity vector • Prediction could be valid for only a limited time horizon in the future
Location-aware Snapshot Query ProcessingDuality Transformation • A line (trajectory) in the two-dimensional space can be transformed into a point in another dual two-dimensional space • Trajectory: x(t) = vt + a Point: (v,a) • All queries will need to be transformed into the dual space • Rectangular queries will be represented as polygons
Location-aware Snapshot Query ProcessingTime-Parameterized Data Structures • A bounding rectangle with MBR & VBR is guaranteed to contain all its moving objects as long as they maintain their velocity vector • High degree of overlap when the velocity vector is not updated • The Time-parameterized R-tree (TPR-tree) consists of: • Minimum bounding rectangles (MBR) • Velocity bounding rectangles (VBR)
Location-aware Snapshot Query ProcessingIndexing Past, Present, and Future • A unified index structure for both past, present, and future data • Makes use of the partial-persistent R-tree for past data and the TPR-tree for current and future data • Double Time-Parameterized Bounding rectangles are used to bound moving objects. Double TPBR has two components: • Tail MBR that starts at the time of the last update and extends to infinity. The tail is a regular TPBR of the TPR-tree • Head MBR to bound the finite historical trajectories. The head is an optimized TPBR • Querying is similar to regular PPR-tree search with the exception of redefining the intersection function to accommodate for the double TPBR
Spatio-temporal Access Methods RPPF-tree Red: Future Blue: Past Green: Present Brown: All
Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Continuous Queries Vs. Snapshot Queries • Approaches for Continuous Query Evaluation • Scalable Execution of Continuous Queries • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues
Answer Query Query • Continuous Queries Answer Query Data Data Snapshot vs. Continuous Query Processing • Traditional (Snapshot) Queries Data
Location-aware Continuous Query ProcessingApproaches • Straightforward Approach • Abstract the continuous queries to a series of snapshot queries evaluated periodically • Result Validation • Result Caching • Result Prediction • Incremental Evaluation
Location-aware Continuous Query ProcessingResult Validation • Associate a validation condition with each query answer • Valid time (t): • The query answer is valid for the next t time units • Valid region (R) • The query answer is valid as long as you are within a region R • It is challenging to maintain the computation of valid time/region for querying moving objects • Once the associated validation condition expires, the query will be reevaluated
Location-aware Continuous Query ProcessingCaching the Result • Observation: Consecutive evaluations of a continuous query yield very similar results • Idea: Upon evaluation of a continuous query, retrieve more data that can be used later • K-NN query • Initially, retrieve more than k • Range query • Evaluate the query with a larger range • How much we need to pre-compute? • How do we do re-caching?
Location-aware Continuous Query ProcessingPredicting the Result • Given a future trajectory movement, the query answer can be pre-computed in advance • The trajectory movement is divided into N intervals, each with its own query answers Ai Nearest-Neighbor Query • The query is evaluated once (as a snapshot query). Yet, the answer is valid for longer time periods • Once the trajectory changes, the query will be reevaluated
+ _ + Location-aware Continuous Query ProcessingIncremental Evaluation • The query is evaluated only once. Then, only the updates of the query answer are evaluated • There are two types of updates. Positiveand Negativeupdates Query Result • Only the objects that cross the query boundary are taken into account • Need to continuously listen for notifications that someone cross the query boundary
Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Centralized Database Systems • Location-aware Distributed Database Systems • Location-aware Data Stream Management Systems • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues
Location-aware DatabaseServer Continuous Continuous K-NN Query Range Query Keep me updated by nearest 3 hospitals How many cars in the highlighted area? Make sure that the nearest 3 airplanes are FRIENDLY Alert me if there are less than 3 police cars within 5 miles Monitor the traffic in the red areas Continuous Continuous Continuous K-NN Query Range Query Range Query Scalability of Location-aware Continuous QueriesMotivation
Scalability of Location-aware Continuous QueriesMain Concepts • Continuous queries last for long times at the server side • While a query is active in the server, other queries will be submitted • Shared execution among multiple queries • Should we index data OR queries? • Data and queries may be stationary or moving • Data and queries are of large size • Data and queries arrive to the system with very high rates • Treat data and queries similarly • Queries are coming to data OR data are coming to queries? • Both data and queries are subjected to each other • Join data with queries
Each query is a single thread One thread for all continuous queries . . . QN Q2 Q1 . . . . . . . . . . . . ST Query 1 ST Query N ST Query 2 D- D- D- Split Q- D- Index Index Index Index Index Shared ST Join Data Objects Data Objects Data Objects Data Objects ST Queries Scalability of Location-aware Continuous QueriesMain Concepts (Cont.) • Evaluating a large number of concurrent continuous spatio-temporal queries is abstracted as a spatio-temporal join between moving objects and moving queries
Moving Objects (Stationary) ST Queries in an R-tree index structure Scalability of Location-aware Continuous QueriesLocation-aware Centralized Database Systems • Centralized index structures • Index the queries instead of data • Valid only for stationary queries
Scalability of Location-aware Continuous QueriesLocation-aware Centralized Database Systems (Cont.) • To accommodate for the continuous movement of both data and queries: • Concurrent continuous queries share a grid structure • Moving objects are hashed to the same grid structure as queries • The spatio-temporal join is done by overlaying the two grid structures
Scalability of Location-aware Continuous QueriesLocation-aware Distributed Database Systems • Motivation: Centralized location-aware servers will have a bottleneck at the server side • Assumption: Moving objects have devices with the capability of doing some computations • Idea: • Server will ship some of its processing to the moving objects • Server will act as a mediator among moving objects • Implementation: Moving objects should welcome cooperation in such environments
Scalability of Location-aware Continuous QueriesLocation-aware Distributed Database Systems (Cont.) • Each moving object O maintains a list of the queries that O may be part of their answer • It is the responsibility of the moving object O to report that O becomes part of the answer of a certain query • Once a query updates its location, it sends the new location to the server, which will propagate the new location to the interested users • The server is responsible in determining which objects will be interested in which queries
Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems • Motivation: Very high arrival rates that are beyond the system capability to store • Idea: Only store those objects that are likely to produce query results, i.e., only significant objects are stored, all other data are simply dropped • Significant objects: A moving object O is significant if there is at least one query that is interested in O’s location • Challenge: Discovering that an object becomes insignificant
Cache Area Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • Only significant objects are stored in-memory • An object is considered significant if it is either in the query area or the cache area • Due to the query and object movements, a stored object may become insignificant at any time • Larger cache area indicates more storage overhead and more accurate answer
Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • The first kobjects are considered an initial answer • K-NN query is reduced to a circular range query However, the query area may shrink or grow K = 3
Q2 QN Q1 Q1 Q2 QN . . . . . . . . . . . . . . . +/- +/- +/- +/- +/- +/- . . . Stationary Range Moving kNN Moving Range Split Shared Operator Shared Spatio-temporal Join Shared Memory Buffer among all C. Queries Stream of Moving Objects Stream of Moving Objects Stream of Moving Queries Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) Each query is a single thread One thread for all continuous queries
2 1 6 5 3 4 7 K = 2 Scalability of Location-aware Continuous QueriesLocation-aware Data Stream Management Systems (Cont.) • Query Load Shedding • Reduce the cache area • Possibly reduce the query area • Immediately drop insignificant tuples • Intuitive and simple to implement • Object Load Shedding • Objects that satisfy less than k queries are insignificant • Lazily drop insignificant tuples • Challenge I: How to choose k? • Challenge II: How to provide a lower bound for the query accuracy?
Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimization • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues
Location-aware Query Optimization • Spatio-temporal pipelinable query operators • Range queries • Nearest-neighbor queries • Selectivity estimation for spatio-temporal queries/operators • Spatio-temporal histograms • Sampling • Adaptive query optimization for continuous queries
Database Engine Spatio-temporal Operators Only produce objects in the areas of interest The performance of scalar functions is limited Database Engine Spatio-temporal Query Operators • Existing Approaches are Built on Top of DBMS (at the Application Level) Continuously report the trucks in this area Scalar functions (Stored procedure) SELECT O. ID FROM Objects O WHERE O.type = truck INSIDEArea A
Scalar Function INSIDE JOIN JOIN Spatio-temporal Query Operators • “Continuously report the Avis cars in a certain area” SELECT M.ObjectID FROM MovingObjects M, AvisCars A WHERE M.ID = A.ID INSIDE RegionR Spatio-temporal Operators Scalar Function +/- +/- INSIDE +/- AvisCars Moving Objects AvisCars Moving Objects
SELECT SELECT INSIDE INSIDE Spatio-temporal Selectivity Estimation • Estimating the selectivity of spatio-temporal operators is crucial in determining the best plan for spatio-temporal queries SELECT ObjectID FROM MovingObjects M WHERE Type = Truck INSIDE Region R
x t Spatio-temporal Histograms • Moving objects in D-dimensional space are mapped to 2D-dimensional histogram buckets x t
6.98% 6.98% 6.01% 6.01% Spatio-temporal Histogram 6.98% 6.98% 6.01% 6.01% Query Executer 6.01% 6.01% 6.01% 6.01% 6.25% 6.25% 6.25% 6.25% 6.01% 6.01% 6.01% 6.01% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% 6.25% Spatio-temporal Histograms with Query Feedback • Estimating the selectivity of spatio-temporal operators is crucial in determining the best plan for spatio-temporal queries 10% Q1 Query Query Optimizer Query plan Feedback
SELECT ObjectID FROM MovingObjects M WHERE Type = Truck INSIDE Region R SELECT SELECT INSIDE INSIDE Moving Objects Moving Objects Adaptive Query Optimization • Continuous queries last for long time (hours, days, weeks) • Environment variables are likely to change • The initial decision for building a query plan may not be valid after a while • Need continuous optimization and ability to change the query plan: • Training period: Spatio-temporal histogram, periodicity mining • Online detection of changes
Tutorial Outline • Location-aware Environments • Location-aware Snapshot Query Processing • Location-aware Continuous Query Processing • Scalable Execution of Continuous Queries • Location-aware Query Optimizer • Uncertainty in Location-aware Query Processing • Case Study • Open Research Issues
Uncertainty in Moving Objects • Location information from moving objects is inherently inaccurate • Sources of uncertainty: • Sampling. A moving object sends its location information once every t time units. Within any two consecutive locations, we have no clue about the object’s exact location • Reading accuracy. Location-aware devices do not provide the exact location • Object movement and network delay. By the time that a certain reading is received by the server, the moving object has already changed its location
Uncertainty in Moving Objects • Historical data (Trajectories) • Current data T1 T0+Є2 T0+Є0 T0 T0+Є1
Uncertainty in Moving ObjectsError in Query Answer • Range Queries • Nearest Neighbor Queries