State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

State-Slice:New Paradigm of Multi-query Optimization ofWindow-based Stream Queries Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA.

Computation Sharing for Stream Processing Register Continuous Queries σ Streaming Data Streaming Result w1 П Agg w2 σ Agg σ w3 SPJA Query Network • New Challenges: • In-memory processing of stateful operators • Stateful operators with various window constraints 32nd VLDB Conference, Seoul, Korea, 2006

Buffer A Buffer B B[w] A[w] A B Window Constraints for Stateful Operators • Time-based sliding window constraints • Each tuple has a timestamp • Only tuples within W timeframe can form an output • Observations: • States in the operator dominate memory usage • State size is proportional to the input rate and window length • Join CPU cost is proportional to the state size 32nd VLDB Conference, Seoul, Korea, 2006

Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA B A A Motivation Example Q1: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId WINDOW w1 min Q2: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId AND A.Value>Threshold WINDOW w2 min Let: w1<w2 • Observations: • State A[W1] overlaps with state A[W2] • State B[W1] overlaps with state B[W2] • Joined results of Q1 and Q2 overlap 32nd VLDB Conference, Seoul, Korea, 2006

Q1 B[w1] A[w1] Q2 A B σA Q2 B[w2] B[w2] A[w2] A[w2] σA B B A A Sharing with Selection Pull-up [CDF02, HFA+03] Q1 Q2 • Selection pull up • Using larger window (w2) Router σA |Ta-Tb | <W1 all R + A[w2] B[w2] A B • [CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02. • [HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03. 32nd VLDB Conference, Seoul, Korea, 2006

Sharing with Selection Pull-up [CDF02, HFA+03] • Pros • Single Join Operator • Cons • Wasted Computation without Early Filtering • Wasted State Memory without Early Filtering • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA.Value>Threshold B A Stream Partition with Selection Pushdown [KFH04] Q2 Q1 Router all <W1 |Ta-Tb | Union U R A1 B1 A2 B2 + A[w1] B[w2] A[w2] B[w1] 1 2 <= Split S > Threshold B A • Split stream A by A.Value • Route shared join results • [KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04. 32nd VLDB Conference, Seoul, Korea, 2006

Stream Partition with Selection Pushdown [KFH04] • Pros • Selection pushdown: no wasted Join Computation • Cons • Multiple Join Operators • Duplicated State Memory in Multiple Join Operators • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

State-Slice: New Sharing Paradigm • Key Ideas: • State-Slice Concept for Sliding Window Join • Pipelined Chain of Join Slices • Prospective Benefit: • Fine-grained Selection Push-down • Pipelined Join Operators • Avoiding Per-tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

Joined-Result A Tuple Purged-A-Tuple State of Stream A: [w1, w2] Probe B Tuple Propagated-B-Tuple One-way State Sliced Window Join • Iower bound of sliding window: [w1,w2] • B tuple only probes A tuples that are “older” at least W1, but at most W2, than itself 32nd VLDB Conference, Seoul, Korea, 2006

Union U A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] B Tuple Probe Probe J1 J2 Queue(s) The Chain of One-way State-Sliced Joins Joined-Result = • Split state memory into chain of joins • No overlap of state memory in chain of joins 32nd VLDB Conference, Seoul, Korea, 2006

From One-way to Two-way Binary Join Joined-Result U Union female A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] male Queue(s) male B Tuple State of Stream B: [0, w1] State of Stream B: [w1, w2] female J2 J1 • Intuitively a combination of two one-way join • Two references for each A or B tuples • Male tuples are used to probe states • Female tuples are inserted and cross-purged to respective states 32nd VLDB Conference, Seoul, Korea, 2006

Q1 Q1 Q2 Q1 B[w1] A[w1] Union σA U B[w1] A[w1] A2 B2 A B s A B [W1,W2] [W1,W2] 2 Q2 σA B[w2] A[w2] B1 s [0,W1] [0,W1] 1 σA B A A B State-Sliced Join Chain: The Example + A1 • States of sliced joins in a chain are disjoint with each other  Minimize State Memory Usage • Selection can be pushed down into middle of join chain  Avoid Unnecessary Resource Waste • No routing step is needed  Avoid Per Output-Tuple Routing Cost Completely 32nd VLDB Conference, Seoul, Korea, 2006

Summary: State-Sliced Join Chain • Pros: • Minimized Memory Usage • Reduced Routing Cost • No Need of Operator Synchronization in the Chain • Cons: • Stream traffic between pipelined joins • Purge cost 32nd VLDB Conference, Seoul, Korea, 2006

Union Union U U … QN Q1 Q2 Q3 Union U Union U … Union U … A s s s s 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] … QN Q1 Q2 Q3 σ1 σ2 σ3 σN Union σ’3 U … σ’2 σN σ’3 σ’1 σ’2 … s s s s A 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] Sharing via Chains: Memory-Optimal Chain • No Selection: • With Selection: 32nd VLDB Conference, Seoul, Korea, 2006

Union Union Union Union U U U U Q1 Q2 Q4 Q3 Q5 A s s s s s 1 2 4 3 5 B [0,w1] [w1,w2] [w2,w3] [w3,w4] [w4,w5] Mem-Optimal Chain CPU-Optimal Chain? • Overheads: • Too many operators may increase system context switch cost • Too many sliced states increase purging cost 32nd VLDB Conference, Seoul, Korea, 2006

Union Union Union Union U U U U … Qi Qj … Qi Qj … … ≥wj-1 … <wi … … s s … Router R i j |Ta-Tb | [wj-1,wj] [wi-1,wi] s … … i [wi-1,wj] Merging Sliced Joins • Tradeoff: • Gain from Merging • Reduce number of Join operators • Reduce extra purging cost • Loss from Merging • Introduce routing cost • Increase memory usage due to selection pullup • Cost Model for CPU Usage 32nd VLDB Conference, Seoul, Korea, 2006

Union Union Union U U U Q2 Q3 Q5 Q4 Q1 Router <w1 R |Ta-Tb | <w4 Router R |Ta-Tb | A s s s 1 2 3 B [0,w2] [w2,w3] [w3,w5] CPU-Opt. Chain: Search Space & Solution • Legend: • Vi: window start/end time • Vi toVj : one slice window v0 v1 v2 v3 v5 v4 • w0 • w2 • w1 • w3 • w4 • w5 Shortest path problem 32nd VLDB Conference, Seoul, Korea, 2006

CPU-Opt. Chain State Merge State Slice Selection PullUp Sharing Mem-Opt. Chain Summary: Mem-Opt. vs. CPU-Opt. Join Chain • Mem-Optimal: • Minimized Memory Usage • Higher System Overhead • Higher Purging Cost • CPU-Optimal: • Minimized CPU Usage • More Memory Usage if Selection is Pulled Up to Merge Slices. 32nd VLDB Conference, Seoul, Korea, 2006

Experimental WPI Stream Engine: CAPE Software Demonstration VLDB’04 32nd VLDB Conference, Seoul, Korea, 2006

Experiment Study 1: Memory Consumption 32nd VLDB Conference, Seoul, Korea, 2006

Experiment Study 2: Total Service Rate 32nd VLDB Conference, Seoul, Korea, 2006

Experiment Study 3: Mem-Opt. vs. CPU-Opt. Window Distributions Used for 12 Queries. Small-Large: 12 Queries Small-Large: 24 Queries 32nd VLDB Conference, Seoul, Korea, 2006

Conclusion • Pipelined state sliced join chain • Mem-Optimal chain construction • CPU-Optimal chain construction • Implemented in CAPE • Performance evaluation 32nd VLDB Conference, Seoul, Korea, 2006

Thank You! Visit CAPE Homepagehttp://davis.wpi.edu/dsrg/CAPE/index.html Supported by: CRI grant CNS 05-51584 32nd VLDB Conference, Seoul, Korea, 2006

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

Presentation Transcript

Query Optimization

Multi-Slice CT

Multi-Query Optimization

Query Optimization

Scalable Delivery of Stream Query Result

Query Optimization: Relational Queries to Data Mining

An Overview of Query Optimization

Supporting Ranking in Queries Score-based Paradigm

Cost-based Optimization of Graph Queries

Overview of Query Optimization

Multi-Slice CT

Query Optimization

Query Optimization

Query Optimization

Extensions to Multi Query Optimization

Multi-Query Optimization and Applications

Query Optimization

XDS Stored Query – Multi-patient Queries

Data Engineering Query Optimization (Cost-based optimization)

Extensions to Multi Query Optimization

Query Optimization

Query Optimization