1 / 25

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries. Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA.

Télécharger la présentation

State-Slice: New Paradigm of Multi-query Optimization of Window-based Stream Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. State-Slice:New Paradigm of Multi-query Optimization ofWindow-based Stream Queries Samrat Ganguly Sudeept Bhatnagar NEC Laboratories America Inc. Princeton, NJ, USA. Song Wang Elke Rundensteiner Database Systems Research Group Worcester Polytechnic Institute Worcester, MA, USA.

  2. Computation Sharing for Stream Processing Register Continuous Queries σ Streaming Data Streaming Result w1 П Agg w2 σ Agg σ w3 SPJA Query Network • New Challenges: • In-memory processing of stateful operators • Stateful operators with various window constraints 32nd VLDB Conference, Seoul, Korea, 2006

  3. Buffer A Buffer B B[w] A[w] A B Window Constraints for Stateful Operators • Time-based sliding window constraints • Each tuple has a timestamp • Only tuples within W timeframe can form an output • Observations: • States in the operator dominate memory usage • State size is proportional to the input rate and window length • Join CPU cost is proportional to the state size 32nd VLDB Conference, Seoul, Korea, 2006

  4. Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA B A A Motivation Example Q1: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId WINDOW w1 min Q2: SELECT A.* FROM Temperature A, Humidity B WHERE A.LocationId= B.LocationId AND A.Value>Threshold WINDOW w2 min Let: w1<w2 • Observations: • State A[W1] overlaps with state A[W2] • State B[W1] overlaps with state B[W2] • Joined results of Q1 and Q2 overlap 32nd VLDB Conference, Seoul, Korea, 2006

  5. Q1 B[w1] A[w1] Q2 A B σA Q2 B[w2] B[w2] A[w2] A[w2] σA B B A A Sharing with Selection Pull-up [CDF02, HFA+03] Q1 Q2 • Selection pull up • Using larger window (w2) Router σA |Ta-Tb | <W1 all R + A[w2] B[w2] A B • [CDF02]: J. Chen, D. J. DeWitt, and J. F. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE’02. • [HFA+03]: M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid. Scheduling for shared window joins over data streams. In VLDB’03. 32nd VLDB Conference, Seoul, Korea, 2006

  6. Sharing with Selection Pull-up [CDF02, HFA+03] • Pros • Single Join Operator • Cons • Wasted Computation without Early Filtering • Wasted State Memory without Early Filtering • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

  7. Q1 B[w1] A[w1] A B Q2 B[w2] A[w2] σA.Value>Threshold B A Stream Partition with Selection Pushdown [KFH04] Q2 Q1 Router all <W1 |Ta-Tb | Union U R A1 B1 A2 B2 + A[w1] B[w2] A[w2] B[w1] 1 2 <= Split S > Threshold B A • Split stream A by A.Value • Route shared join results • [KFH04]: S. Krishnamurthy, M. J. Franklin, J. M. Hellerstein, and G. Jacobson. The case for precision sharing. In VLDB’04. 32nd VLDB Conference, Seoul, Korea, 2006

  8. Stream Partition with Selection Pushdown [KFH04] • Pros • Selection pushdown: no wasted Join Computation • Cons • Multiple Join Operators • Duplicated State Memory in Multiple Join Operators • Per Output-Tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

  9. State-Slice: New Sharing Paradigm • Key Ideas: • State-Slice Concept for Sliding Window Join • Pipelined Chain of Join Slices • Prospective Benefit: • Fine-grained Selection Push-down • Pipelined Join Operators • Avoiding Per-tuple Routing Cost 32nd VLDB Conference, Seoul, Korea, 2006

  10. Joined-Result A Tuple Purged-A-Tuple State of Stream A: [w1, w2] Probe B Tuple Propagated-B-Tuple One-way State Sliced Window Join • Iower bound of sliding window: [w1,w2] • B tuple only probes A tuples that are “older” at least W1, but at most W2, than itself 32nd VLDB Conference, Seoul, Korea, 2006

  11. Union U A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] B Tuple Probe Probe J1 J2 Queue(s) The Chain of One-way State-Sliced Joins Joined-Result = • Split state memory into chain of joins • No overlap of state memory in chain of joins 32nd VLDB Conference, Seoul, Korea, 2006

  12. From One-way to Two-way Binary Join Joined-Result U Union female A Tuple State of Stream A: [0, w1] State of Stream A: [w1, w2] male Queue(s) male B Tuple State of Stream B: [0, w1] State of Stream B: [w1, w2] female J2 J1 • Intuitively a combination of two one-way join • Two references for each A or B tuples • Male tuples are used to probe states • Female tuples are inserted and cross-purged to respective states 32nd VLDB Conference, Seoul, Korea, 2006

  13. Q1 Q1 Q2 Q1 B[w1] A[w1] Union σA U B[w1] A[w1] A2 B2 A B s A B [W1,W2] [W1,W2] 2 Q2 σA B[w2] A[w2] B1 s [0,W1] [0,W1] 1 σA B A A B State-Sliced Join Chain: The Example + A1 • States of sliced joins in a chain are disjoint with each other  Minimize State Memory Usage • Selection can be pushed down into middle of join chain  Avoid Unnecessary Resource Waste • No routing step is needed  Avoid Per Output-Tuple Routing Cost Completely 32nd VLDB Conference, Seoul, Korea, 2006

  14. Summary: State-Sliced Join Chain • Pros: • Minimized Memory Usage • Reduced Routing Cost • No Need of Operator Synchronization in the Chain • Cons: • Stream traffic between pipelined joins • Purge cost 32nd VLDB Conference, Seoul, Korea, 2006

  15. Union Union U U … QN Q1 Q2 Q3 Union U Union U … Union U … A s s s s 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] … QN Q1 Q2 Q3 σ1 σ2 σ3 σN Union σ’3 U … σ’2 σN σ’3 σ’1 σ’2 … s s s s A 1 2 N 3 B [0,w1] [w1,w2] [w2,w3] [wN-1,wN] Sharing via Chains: Memory-Optimal Chain • No Selection: • With Selection: 32nd VLDB Conference, Seoul, Korea, 2006

  16. Union Union Union Union U U U U Q1 Q2 Q4 Q3 Q5 A s s s s s 1 2 4 3 5 B [0,w1] [w1,w2] [w2,w3] [w3,w4] [w4,w5] Mem-Optimal Chain CPU-Optimal Chain? • Overheads: • Too many operators may increase system context switch cost • Too many sliced states increase purging cost 32nd VLDB Conference, Seoul, Korea, 2006

  17. Union Union Union Union U U U U … Qi Qj … Qi Qj … … ≥wj-1 … <wi … … s s … Router R i j |Ta-Tb | [wj-1,wj] [wi-1,wi] s … … i [wi-1,wj] Merging Sliced Joins • Tradeoff: • Gain from Merging • Reduce number of Join operators • Reduce extra purging cost • Loss from Merging • Introduce routing cost • Increase memory usage due to selection pullup • Cost Model for CPU Usage 32nd VLDB Conference, Seoul, Korea, 2006

  18. Union Union Union U U U Q2 Q3 Q5 Q4 Q1 Router <w1 R |Ta-Tb | <w4 Router R |Ta-Tb | A s s s 1 2 3 B [0,w2] [w2,w3] [w3,w5] CPU-Opt. Chain: Search Space & Solution • Legend: • Vi: window start/end time • Vi toVj : one slice window v0 v1 v2 v3 v5 v4 • w0 • w2 • w1 • w3 • w4 • w5 Shortest path problem 32nd VLDB Conference, Seoul, Korea, 2006

  19. CPU-Opt. Chain State Merge State Slice Selection PullUp Sharing Mem-Opt. Chain Summary: Mem-Opt. vs. CPU-Opt. Join Chain • Mem-Optimal: • Minimized Memory Usage • Higher System Overhead • Higher Purging Cost • CPU-Optimal: • Minimized CPU Usage • More Memory Usage if Selection is Pulled Up to Merge Slices. 32nd VLDB Conference, Seoul, Korea, 2006

  20. Experimental WPI Stream Engine: CAPE Software Demonstration VLDB’04 32nd VLDB Conference, Seoul, Korea, 2006

  21. Experiment Study 1: Memory Consumption 32nd VLDB Conference, Seoul, Korea, 2006

  22. Experiment Study 2: Total Service Rate 32nd VLDB Conference, Seoul, Korea, 2006

  23. Experiment Study 3: Mem-Opt. vs. CPU-Opt. Window Distributions Used for 12 Queries. Small-Large: 12 Queries Small-Large: 24 Queries 32nd VLDB Conference, Seoul, Korea, 2006

  24. Conclusion • Pipelined state sliced join chain • Mem-Optimal chain construction • CPU-Optimal chain construction • Implemented in CAPE • Performance evaluation 32nd VLDB Conference, Seoul, Korea, 2006

  25. Thank You! Visit CAPE Homepagehttp://davis.wpi.edu/dsrg/CAPE/index.html Supported by: CRI grant CNS 05-51584 32nd VLDB Conference, Seoul, Korea, 2006

More Related