Top- k Queries on Uncertain Data

Top-k Queries on Uncertain Data 指導教授：陳良弼老師報告者：鄧雅文 97753034

Outline • Introduction • Related Work • Problem Formulation • Future Work

Introduction • Top-k query on certain data • Rank results according to a user-defined score • Important for explore large databases • E.g., top-2 = {T1, T2}

Introduction (cont.) • Uncertain database • How to define top-k on uncertain data? • Mutually exclusive rules • E.g., T1♁T4

Related Work • C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009. • Causes: • Sensor networks,privacy, trajectories prediction… • The main areas of research on the uncertain data: • Modeling of uncertain data • Uncertain data management • Top-k query, range query, NN query… • Uncertain data mining • Clustering, classification, frequent pattern, outliers…

Related Work (cont.) • M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007. • Possible Worlds

Related Work (cont.) • U-Topk query • Return k tuples that can co-exist in a possible world with the highest probability • E.g., {T1, T2} as U-Top2 • U-kRanks query • Return k tuples each of which is a clear winner in its rank over all possible worlds • E.g., {T2, T6} as U-2Ranks

Related Work (cont.) • M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008. • PT-k query • Return a set of all tuples whose top-k probability values are at least p • E.g., {T1, T2, T5} as PT-2 (with p=0.4)

Related Work (cont.) • T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009. • The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k • Return a number of typical vectors that efficiently sample the distribution of all potential top-ktuple vectors

Problem Formulation • Example: • In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

Problem Formulation (cont.) • U-Top3={T2, T5, T6} • But U-Top2={T1, T2}, U-Top1={T1} • How about also considering {T1, T2, T5} as top-3?

Problem Formulation (cont.) • We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). • Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer • E.g., k=3 {T1, T2, T3} as a possible top-k with P=0.0356 C is composed in some way of Pr({T1, T2}) to be top-2=0.2542 and its confidence, Pr({T1, T3}) to be top-2=0.0218 and its confidence, Pr({T2, T3}) to be top-2=0.0512 and its confidence

Problem Formulation (cont.) • Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. • E.g., {T1, T3, T5}: P=0.8, C=0.4 {T1, T4, T7}: P=0.5, C=0.7 {T2, T6, T7}: P=0.3, C=0.2  this will not be returned

Future Work • Formulate the confidence function • Find an algorithm to generate the result set • Try to calculate the confidence in an efficient way • Carry out an empirical study on datasets

Thank you!

Top- k Queries on Uncertain Data

Top- k Queries on Uncertain Data

Presentation Transcript

Top-k Query Processing in Uncertain Database

Evaluating Top- K Selection Queries

Top- k Queries on Uncertain Data: On Score Distribution and Typical Answers

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications

Dynamic Structures for Top- k Queries on Uncertain Data

Answering Top-k Queries Using Views

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach

Top-k Queries on Temporal Data

Efficient Processing of Top- k Queries in Uncertain Databases

Evaluating Probability Threshold k-Nearest-Neighbor Queries over Uncertain Data

Answering Top-k Queries Using Views

Probabilistic Queries and Uncertain Data

7 Top-k Queries on Web Sources and Structured Data

Sliding-window Top-k Queries on Uncertain Streams

Answering Top-k Queries Using Views

Cleaning Uncertain Data for Top-k Queries

Continuous Top-k Dominating Queries

Reverse Top- k Queries

Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data

Count / Top-k Continuous Queries on P2P Networks

Top K Dominating Queries on Incomplete Data with Priorities

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications