340 likes | 358 Vues
Explore efficient processing techniques for reverse k-nearest neighbor queries on uncertain data, using a probabilistic framework. Learn about approximation, spatial and probabilistic filters, evaluation, and more.
E N D
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle
Outline • Background • Uncertain Data Model • Reverse k-nearest neighbour queries • Reverse k-nearest neighbour queries on uncertain objects • Framework for Probabilistic RkNN Processing • Approximation • Spatial Filter • Probabilistic Filter • Verification • Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries User ratingsfor „Life of Brian“ Uncertain Attribute a PDFX Action Uncertain Attribute b Humor • Objects are described by a multi-dimensional probability distribution • Object Independence Assumption • Queries are answered according to possible worlds semantic • Object PDFs can be spatially bounded • Continuous or discrete representation Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries RkNN(q) = {o DB | q kNN(o)} o2 o1 • Whatisitgoodfor? • Market segmentation • Outlierdetection • Incrementalalgorithms • … o3 o4 o5 q o6 R1NN(q) = {o7} R2NN(q) = {o7, o5,o4} o7 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ O2 O‘ O1 Q Note: The queryobjectmaybe uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O‘ O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background DatamodelFramework RkNN QueriesSummary PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = {O DB | P(O RkNN(Q)) ≥ τ} {O DB | P(Q kNN(O)) ≥ τ} O2 O‘ P(Q 1NN(O‘)) = 21/24 e.g. O‘ PR1NN(Q, 0.5) O1 Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Framework for PRkNN query processing • Approximation (Indexing) • Simplification of spatial-probabilistic keys • Spatial Filter • Filter objects according to simple spatial keys • Probabilistic Filter • Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) • Filter objects according to lower/upper probability bounds • Verification • Computation of the exact probability (very expensive) • Monte-Carlo Sampling (many samples required) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background ApproximationFramework Spatial FilterSummary Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.1 0.2 0.45 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFrameworkSpatial FilterSummary Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. Task Find k objects O DB\O‘ which are closer to O‘ than to Q O Q B For any O‘ in this region, O is not closer than Q. Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFrameworkSpatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q O‘ B Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • Exemplary statements • O1 is closer to O’ with at least 20% and at most 50% • O2is closer to O’ with at least 60% and at most 80% • Correctly deriving these bounds is not trivial (see paper) • How many objects O DB are closer to O‘ than Q? • Consider the following uncertain generating function • x-term: probability of the object to be closer to O’ than Q • z-term: probability of the object to be further from O’ than Q • y-term: uncertainty • => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) • Expansion yields0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² 80 % 60 % probability 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification • Example PRkNN queries • PR1NN (Q, 50%) O‘ is not part of the result • PR2NN (Q, 40%) O‘ is part of the result • PR2NN (Q, 80%) O‘ has to be further investigated 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 0 1 2 Maximum # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 0 1 2 Maximum # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%) O‘ is not part of the result • PR2NN (Q, 40%) O‘ is part of the result • PR2NN (Q, 80%) O‘ has to be further investigated 24
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 0 1 2 Maximum # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%) O‘ is not part of the result • PR2NN (Q, 40%) O‘ is part of the result • PR2NN (Q, 80%) O‘ has to be further investigated 25
BackgroundApproximationFramework Spatial FilterSummary Probabilistic Filter Verification 100 % 80 % 80 % 60 % probability 60 % 40 % probability 40 % 20 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 0 1 2 Maximum # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data • Example PRkNN queries • PR1NN (Q, 50%) O‘ is not part of the result • PR2NN (Q, 40%) O‘ is part of the result • PR2NN (Q, 80%) O‘ has to be further investigated 26
BackgroundApproximationFramework Spatial FilterSummary Probabilistic FilterVerification Options for Verification • Consideration of all possible worlds (exponential) • Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) • Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background EvaluationFramework ConclusionSummary Spatial Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background EvaluationFramework ConclusionSummary Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Background EvaluationFramework ConclusionSummary Comparison to other algorithms
BackgroundEvaluationFramework ConclusionSummary • Framework for PRkNN query processing • Deriving probabilistic pruning bounds for single objects • Accumulate theses bounds using uncertain generating functions • Cost model for choosing the optimal value for tree depth • Comparison to existing algorithms for PRNN processing
Thanks! • Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Problem of dependency O’ O1, O2 Q