Probabilistic Data Management
E N D
Presentation Transcript
Probabilistic Data Management Chapter 5: Probabilistic Query Answering (3)
Objectives • In this chapter, you will: • Learn the definition and query processing techniques of a probabilistic query type • Probabilistic Reverse Nearest Neighbor Query
Recall: Probabilistic Query Types Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Spatial Query Probabilistic Preference Query 3
Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases Very Large Data Bases Journal (VLDBJ), 2009
Outline • Introduction • Related Work • Problem Definition • PRNN Query Processing • Experimental Evaluation • Summary
Reverse Nearest Neighbor Query (RNN) • Rescue tasks in oceans • In the case of emergency, a ship will ask its nearest ship for help • A rescue ship needs to monitor those ships that have itself as their nearest neighbors • In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)
Introduction • Reverse Nearest Neighbor Query (RNN) • Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3
RNN Processing on Certain Data Points TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8
RNN Processing on Certain Data Points TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9
Probabilistic Reverse Nearest Neighbor Query (PRNN) • Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise • Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently
Other Application of PRNN • Mixed-reality game • Each player tend to shoot his/her nearest neighbor • A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors • Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects
PRNN Definition • Probabilistic Reverse Nearest Neighbor (PRNN) Queries
A Straightforward Method • For every uncertain object o in the database • Sequentially scan all the objects in the database • Calculate the PRNN probability, PPRNN(q, o), that o is an RNN of q • If PPRNN(q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded • Analysis • Complexity: O(N2), where N is the database size • The computation of probability PPRNN(q, o) is very costly
Pruning Techniques • Geometric Pruning (GP) • GP0 method • The object distribution in the uncertainty region can be either known or unknown • Prune those data objects that definitely cannot be RNN of q • GPb method (b (0, 1]) • The object distribution in uncertainty region is known and the pre-computation is allowed • Prune those objects with the PRNN probability smaller than b
Heuristics of GP0 Method • Data objects always reside within uncertainty regions conservative pruning region (CPR)
Heuristics of GP0 Method (cont.) no false dismissals are introduced with hypersphere approximation candidate o
Conditions of GP0 Method • Pruning Conditions • dist(P, q) - dist(P, Co) > ro • mindist(P, D) rp • In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned
Heuristics of GPb Method (b (0, 1]) • GPbprunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o
Refinement Phase • After applying geometric pruning methods, we can obtain a candidate set • For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q
PRNN Query Processing • Maintain a multidimensional index structureover uncertain database// indexing phase • For each PRNN query • Apply geometric pruning methods during the index traversal // pruning phase • Refine candidates and return the answer set // refinement phase
PRNN Query Processing • Index uncertain data with an R-tree
PRNN Query Procedure • Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) • For each node/object Ni we encounter • Check whether or not Nican be pruned by GP methods • If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object • After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities
Experimental Evaluation • Experimental Settings • Real data sets: LB, MG, TCB, and CAR • Synthetic data sets: • Generate center locationCo of uncertain object o in a data space [0, 1,000]d • Produce radiusro [rmin, rmax] for uncertainty region UR(o) • Four types of data sets: lUrU, lUrG, lSrU, and lSrG • Competitors: • Linear scan (worse than ours by 5-9 orders of magnitude) • Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))
Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1
Summary • We formulate the problem of probabilistic queries over uncertain databases • We propose effective pruning methods to reduce the search space of probabilistic queries • We integrate pruning methods into an efficient query procedure • We verify the efficiency of our proposed approaches through extensive experiments