220 likes | 392 Vues
Stochastic Skyline Operator. Xuemin Lin School of Computer Science University of New South Wales Australia. Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW). Introduction: Skyline. a user preference ≺ is given on each dimension of R d .
E N D
Stochastic Skyline Operator Xuemin Lin School of Computer Science University of New South Wales Australia Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW)
Introduction: Skyline • a user preference ≺ is given on each dimension of Rd. • two points in Rd, udominates v (u≺ v) • i (1 ≤ i ≤ d), u.i ≺= v.i; j (1 ≤ j ≤ d), u.j ≺ v.j • Skyline: • Points not dominated by another point. • Multiple criteria optimal decision making: minimum set of candidates of best options regarding any monotonic functions.
Skyline of Uncertain Objects Probabilistic Skyline: (VLDB07, PODS09, etc) • Skyline probabilities by possible worlds. • Providing the probabilities not worse than any other objects. Provide minimal candidate set of optimal solutions? • How to define optimal options? • How to characterize the minimum candidate set?
Expected Utility & Stochastic Order Expected Utility Principle: • Given a set U of uncertain objects and a decreasing utility function f, select U in U to maxmize E[f (U)]. Stochastic Order: • Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ E[f(U)] ≥ E [f(V)] Decreasing Multiplicative Functions: • ℱ= where fi is nonnegative decreasing. Low orthant order: the stochastic order is defined over the family of decreasing multiplicative functions.
Example • Utility function: • : nonnegative decreasing • : nonnegative decreasing e.g. ; ; 1. B never preferred by the expected utility principle! 2. Psky(A) = 1, Psky (B) = 0.5, Psky (C) = 0.01
Contributions • Introduce a novel skyline operator: stochastic skyline. • Guarantee the minimal candidate set to the optimal solutions regarding decreasing multiplicative functions. • NP-Completeness of computing stochastic skyline regarding dimensionality d. • Novel statistic base pruning techniques. • Efficient partition base verification algorithms: polynomial if d is fixed.
Problem Statement Stochastic Order (lower orthant order): Given U & V, U stochastically dominates V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf (x) and exists y such that U.cdf (y) > V.cdf (y). U.cdf (x): probability mass of U in the rectangular region R ((0,0,…0), x); see the shaded region. Stochastic Skyline: the objects in U not stochastically dominated by any others, called stochastic skyline. Problem Statement: efficiently compute stochastic skyline regarding discrete cases.
Minimality of stochastic skyline Stochastic skyline removes all objects not preferred by any non-negative decreasing functions!
Framework • Phase 1: filtering. Remove non-promising objects. • Phase 2: verification. Test stochastic dominance between two objects. BBS combing with a heap: • the “near” progressiveness • only need to test either U ≺sd V or V ≺sd U in most cases (but not both).
Testing if U ≺sd V • Violation point: a point x in Rd+ is a violation point regarding U ≺sd V if U.cdf (x) < V.cdf (x). • Testing algorithm: if no violation points, then U ≺sd V. • Not enough to test instances.
Reduce to Grid Points • Test if U.cdf ≥ V.cdf against grid points only (see (a)). • Testing the switching grid points only (see solid lines (b)).
Algorithm • Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no violation point in R (x, y). • Partition base testing algorithm: • Get switching points • Initial check • Iteratively partition the grid to throw away non-promising sub-grids
Complexity • The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m is the number of instances in V. • NP-Complete regarding d. • Covert (the decision version of) the minimal set cover problem to a special case of the testing problem.
Filtering Techniques Pruning Rule 1: throw away fully dominated entries.
Filtering Techniques Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds.
Size Estimation: Expected size: size of stochastic skyline in Rd is bounded by that of conventional skyline in Rd+1; i.e., lnd (n)/(d+1)!
Empirical Study • C++ with STL compiled with GNU GCC on 2.4GHz Debian • Real data set: NBA player’s game-by-game statistics • Synthetic dataset: anti-correlated, correlated, independent
Summary • a novel skyline operator: stochastic skyline • guarantee minimality . • NP-complete to test stochastic order (lower orthant order) . • novel efficient algorithms to compute stochastic order. Future work: • F is a set of all decreasing functions?