320 likes | 448 Vues
Incremental Aggregation on Multiple Continuous Queries. Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy. Stream Processing. Intelligence monitoring Fraud detection Onset epidemic patterns Network intrusion detection GeoSpacial changes. Transactions
E N D
Incremental Aggregationon Multiple Continuous Queries Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy
Stream Processing • Intelligence monitoring • Fraud detection • Onset epidemic patterns • Network intrusion detection • GeoSpacial changes • Transactions • Senor network readings • Network traffic data
Problem • Aggregate queries • Continuous evaluation • Multiple concurrent queries
Solutions • Incremental aggregation • Incremental multiple aggregate query optimization (incremental sharing)
Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation
System Architecture • New Query Insertion: • Index query network • Identify common computation • Select optimal sharing path • Expand query network Common Computation Identifier (CCI) Engine Query Network Sharing Optimizer (SO) Oracle Coordinator Query Projection Manager (PM) System Catalog Generator Network Operation Manager (NOM) • Query Network Execution: • Code assembly • Incremental aggregation • Periodical execution Code Assembler
Query Examples SH AH S A SN AN SELECT dis_cat, hospital, vdate, COUNT(*), AVERAGE(fee) FROM Med GROUP BY CAT(disease) AS dis_cat, hospital, DAY(visit_time) AS vdate (a) Query A SELECT hospital, vdate, AVERAGE(fee) FROM Med GROUP BY hospital, DAY(visit_time) AS vdate (b) Query B S A B
Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation
Aggregate Function Types • Distributive: aggregate function itself. Sum, count. • Algebraic: a finite set of aggregate functions. Average. • Holistic: no such finite set. Quantiles. Incremental Aggregation
Holistic Aggregation • Revisiting the entire history. • Usage: • For holistic aggregates. • For post-non-incrementally-evaluated aggregates. • Baseline to incremental aggregation. Incremental Aggregation
Algorithm 4: Drop Duplicates t1: AH SH 0: PreUpdate State 5: Insert New Results 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA SN t2: AN 1: Aggregate AN 3: Compute Algebraic Aggregate Incremental Aggregation
Complexity • Aggregate SN. T1 = O(|SN|) • Merge groups in AH to AN. Tcurr2 = O(|AH| + |AN|), Thash2 = O(|AH| + |AN|), Tprefetch2 = O(|AN|) • Compute algebraic aggregates in AN. T3 = O(|AN|) • Drop duplicates. Tcurr4 = O(|AN|*|ANH|) = O(|AN|2), Thash4 = O(|AH|+|AN|), Tprefetch4 = O(|AN|) • Insert new results. T5 = O(|AN|) Incremental Aggregation
Implementation • System catalog: • AggreRules • AggreBasics • Incremental aggregation instantiation Incremental Aggregation
System Catalog AggreRules AggreBasics Incremental Aggregation
AggreBasics: AVERAGE: SUM(X): SUMX AVERAGE: COUNT(W): COUNTW AVERAGE SUM(X) SUM(fee) SUMX SUMX New Query A: AVERAGE(fee) COUNT(W) COUNT(*) Name Mapping: COUNTW COUNTW SUMX fee SUM(fee) SUMA COUNTW COUNT(*) COUNTA AVERAGE(fee) AVGA AggreRules: retrieve rules substitute parse substitute insert columns GroupColumns: SUM(fee): SUMA COUNT(*): COUNTA AVERAGE(fee): AVGA Instantiation Incremental Aggregation
Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation
Incremental Multiple Query Optimization (Incremental Sharing) • Index existing query plan information R. • Given a new query Q, identify the sharable computations from R. • Select the optimal sharing path. • Expand R to compute Q. Incremental Sharing
Expanding Query Network • Limited sharing on holistic aggregates • Sharing on distributive/algebraic aggregates through vertical expansion Incremental Sharing
Vertical Expansion Vertical Expansion B A BH AH 2: 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID 1: Further Aggregate: COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID Incremental Sharing
A B Vertical Expansion 4: Drop Duplicates BH AH 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA 5: Insert New Results AN BN 3: Compute Algebraic Aggregate 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID
Vertical Expansion Complexity • TVcurr = O(|AN|2 + |BH|) • TVhash = O(|AN| + |BH|) • TVprefetch = O(|AN|) Incremental Sharing
System Catalog GroupColumns GroupTopology GroupExprSet GroupExprIndex Incremental Sharing
Select Optimal Sharing Path • Select least-size node for sharing Incremental Sharing
Rerouting S B S B S B A B A S A B Animation Evolution Incremental Sharing
Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation
Evaluation • Databases: • Synthesized FedWire money transfers • Anonymized Medical patient admission records • Queries: • Seed queries • Generate sharable queries from seeds • A wild range of queries (aggregates in this paper) • Simulation: • Historical data (300000 on Fed, and 600000 on Med) • Chunks of new data (4000 per chunk) Evaluation
Incremental Aggregation Total execution time in seconds Evaluation
Execution Time (s) Number of FED queries (a) Fed Evaluation
Execution Time (s) Number of MED queries (a) Med Evaluation
Conclusion • Multiple aggregates over streams • Solutions: • Incremental aggregation • Incremental MQO (incremental sharing) • Built atop DBMSs for direct practical utility • Big performance improvement • Future work: • A broad range of queries • Built atop DSMSs.
Acknowledgement • Work with Professor Jaime Carbonell. • Part of ARGUS by CMU and Dynamix. • Team: Phil Hayes, Santosh Ananthraman, Bob Frederking, Eugene Fink, Dwight Dietrich, Ganesh Mani, Johny Mathew. • Thanks to Professor Chris Olston for helpful discussion.
FED Query Pair 1 Non-VE IBT VE IBT ITT: Average Individual-Tuple Execution Time (s) IBT: Incremental-Batch Execution Time (s) NonVE ITT VE ITT Incremental Size: |SN| (a) Pair 1 Evaluation