170 likes | 288 Vues
This paper presents a detailed exploration of aggregate query processing through probabilistic schema mappings, focusing on COUNT, MIN, MAX, SUM, and AVG computations. We propose two algorithms—By-Table and By-Tuple—for efficient evaluation. The analysis includes time complexity evaluation, empirical assessments using real and synthetic datasets, and the impact of probabilistic schemas on aggregate queries. While our findings highlight strengths in algorithm efficiency, weaknesses such as bias from database optimizations are also discussed. Suggestions for future work include algorithm improvements and extensions to sub-queries.
E N D
Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn
Overview • Aggregate Queries • Probabilistic Schema Mapping • Goals/Objectives • Aggregate Processing (3 proposals) • By-Table Algorithm • By-Tuple Algorithm • Evaluation • Analysis
Aggregate Queries COUNT, MIN, MAX, SUM, AVG Simple PTIME algorithms to compute
By-Table vs By-Tuple • Tuple – consider all possible mappings for each tuple • Table – single mapping for entire table • P(date→postedDate) = 0.7 • P(date→reducedDate) = 0.3
Goals/Objectives • Impact Analysis of Probabilistic Schemas on Aggregate Queries • Aggregate Query Algorithms • Time Complexity Analysis • Evaluation
Aggregation Methods Range Distribution Expected Value
Method Relationships • Distribution • Most time consuming • Most information • Range • Computed directly from distribution • Expected Value • Computed directly from distribution More efficient ways to compute
By-Table Algorithm All PTIME computable
By-Tuple Algorithm (COUNT) O(n * m)
Evaluation • Empirical Evaluation • Real-world dataset (eBay) • Synthetic dataset • Evaluate Time Complexity • Vary tuple numbers • Vary attribute mappings
Analysis • Strengths • Effect of probabilistic schemas on aggregates • Nice PTIME algorithms • Weaknesses • Evaluation was obvious • By-Table results biased by database optimizations • Future Work • Improve algorithms • Extend to sub-queries • Heuristics