Efficient Probabilistic Queries on Uncertain Data
Explore advanced methods for querying uncertain data in mobile group environments, including probabilistic spatial queries and skylines computation. Learn about strategies and algorithms to handle imprecise location-dependent queries effectively.
Efficient Probabilistic Queries on Uncertain Data
E N D
Presentation Transcript
Uncertain Data Mobile Group 报告人:郝兴
Paper List • Querying Imprecise Data in Moving Object Environments. [TKDE 2004] Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. • Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. [VLDB 2005 ] Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar. • Efficient Evaluation of Imprecise Location-Dependent Queries. [ICDE 2007] Jinchuan Chen, Reynold Cheng • Preserving User Location Privacy in Mobile Data Management Infrastructures. [PET 2006] Reynold Cheng, Yu Zhang, Elisa Bertino, and Sunil Prabhakar. • Probabilistic Spatial Queries on Existentially Uncertain Data. [SSTD 2005] Xiangyuan Dai, Man Lung Yiu, Nikos Mamoulis, Yufei Tao, and Michail Vaitis. • Probabilistic Skylines on Uncertain Data. [VLDB 2007] Jian Pei, Bin Jiang, Xuemin Lin, Yidong Yuan
Efficient Evaluation of Imprecise Location-Dependent Queries Jinchuan Chen Reynold Cheng Department of Computing The Hong Kong Polytechnic University
Outline • A Classification of ILDQ • 3 methods: • The Minkowski Sum • Query-Data Duality • Exploiting Probability Threshold
IPQ and IUQ A A q Point object R Uncertainty of Query issuer Uncertain object IPQ: Imprecise Location-Dependent Queries over Point Objects IUQ: Imprecise Location-Dependent Queries over Uncertain Objects
Method 1: The Minkowski Sum A U R B
w h R R Method 2: Query-Data Duality Point Object Query Point w h R Query Point Point Object
Query-Data Duality and IPQ A Uncertainty of Query Issuer U
Method 3: Probability Threshold A p-expanded-query U R R Ф U
The p-bound [VLDB04] p p p Uncertainty region 0 p 0.5 p
e e d d Deriving p-expanded-querywith p-bound p-bound (top) p-expanded-query U R Ф U p-bound (left)
A Pruning Uncertain Objects for C-IUQ (1) • Strategy 1: Use p-bound Uncertain object U p-bound R Ф U
A Pruning Uncertain Objects for C-IUQ (2) • Strategy 2: Use p-expanded query p-expanded-query U R Ф U
y-expanded-query Qp-expanded-query Pruning Uncertain Objects for C-IUQ (3) • Strategy 3: Use both p-bound and p-expanded query If x y < p, then A can be pruned. A U R Ф U x-bound Qp-bound
Probabilistic Spatial Queries on Existentially Uncertain Data Xiangyuan Dai (HKU), Man Lung Yiu (HKU),Nikos Mamoulis (HKU) Yufei Tao (CityU,HK) Michail Vaitis (U Aegean, GR)
Outline • Introduction • Definitions • Evaluation of Probabilistic Queries - range queries - nearest neighbor queries
Definitions • We refer to Ex as existential probability or confidence of x. • We identify two types of probabilistic spatial queries on existentially uncertain objects. - Thresholding query - Ranking query
Evaluation of Probabilistic Queries • range queries • A depth-first search algorithm applied on the R-tree to retrieve the qualified objects • Let Px = Ex • Thresholding query: t is used to filter out objects with Px<t • Ranking query: a priority queue maintains the m results with the highest Px
Evaluation of Probabilistic Queries • nearest neighbor queries Pm = 0 • Pfirst = 1 • p7:Px=0.1 [not a result] Pfirst=1-0.1=0.9 • p8:Px=0.9*0.2 = 0.18 [not a result] Pfirst = 0.9*(1-0.2)=0.72 • p6:Px=0.72 x 0.1 = 0.072 [not a result] Pfirst = 0.72*(1-0.1)=0.648 • p4,:Px=0.648 x 0.5 = 0.342 [result !!!] Pfirst = 0.648*(1-0.5)=0.342 • p5:Px=0.342 x 0.9 = 0.308 [result !!!] Pfirst = 0.342*(1-0.9)=0.034 • Since Pfirst = 0.034 < t= 0.3, the algorithm terminates! Pm
Probabilistic Skylines on Uncertain Data Jian Pei Simon Fraser University, Canada Bin Jiang, Xuemin Lin, Yidong Yuan The University of New South Wales & NICTA, Australia
Outline • Introduction • Probabilistic Skyline Computation • Bounding-Pruning-Refining • Bottom-Up Method • Top-Down Method
Introduction ——Conventional Skylines • n-dimensional numeric space D = (D1, …, Dn) • Large values are preferable • Two points, u dominates v (u ≻ v), if • " Di (1 ≤ i ≤ n), u.Di ≥ v.Di • $ Dj (1 ≤ j ≤ n), u.Dj > v.Dj • Given a set of points S, skyline = {u | uÎS and u is not dominated by any other point} • Example • C ≻ B, C ≻ D • skyline = {A, C, E}
Introduction ——Skylines on Uncertain Data • Example • A set of object S = {A, B, C} • Each instance takes equal probability (0.5) to appear • Probabilistic Dominance • Pr(A ≻ C) = 3/4 • Pr(B ≻ C) = 1/2 • Pr((A ≻ C) ∨ (B ≻ C)) = 1 • Pr(C is in the skyline) ≠ (1 - Pr(A ≻ C)) × (1 - Pr(B ≻ C)) • Probabilistic dominance ≠≻ Probabilistic skyline
Probabilistic Skyline Computation • Bottom-Up Method • " u at layer-k, $ u′at layer-(k-1), s.t., u′≻ u and Pr(u′) ³ Pr(u) • max{Pr(u) | u is at layer-(k-1)} ³ max{Pr(u) | u is at layer-k}
Probabilistic Skyline Computation • Top-Down Method • Partition Tree • Bounding with Partition Trees
Thank you Thank you