1 / 20

Adaptive Ordering of Pipelined Stream Filters

Adaptive Ordering of Pipelined Stream Filters. Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu Wei Some slides are adapted from the presentation slides in sigmod2004. Presentation Overview. Motivation A-GREEDY Algorithm

enya
Télécharger la présentation

Adaptive Ordering of Pipelined Stream Filters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu Wei Some slides are adapted from the presentation slides in sigmod2004

  2. Presentation Overview • Motivation • A-GREEDY Algorithm • INDEPENDENT Algorithm • SWEEP Algorithm • LOCALSWAPS Algorithm • Adaptive Ordering of MultiwayJoins Over Streams • Conclusion

  3. Motivation • Many queries on stream data involve processing commutative sets of filters • Each filter is a stateless predicate that indicates whether a tuple should be further processed • A tuple is output only if it satisfies all filters • Applying filters in different orders can affect the processing time of the query • The changing nature of stream data requires adaptive algorithms for filter ordering

  4. Motivation: Example • Consider the following example usage of a stream processing system: • A network traffic monitoring application built on a stream engine is used to monitor the amount of common traffic flowing through four routers, R1, R2, R3, and R4 over the last ten minutes.

  5. Motivation: Filter Ordering • If the filters are dependent, then the order of probing affects processing time • For instance, it could be known that most traffic through R1 comes from R3 or R4, but rarely from R2 • For incoming data on R1, probing W2 first would drop a tuple quickest, thus saving the time of probing W3 and W4

  6. Motivation: Filter Ordering Challenges • Filter selectivities may be correlated • The candidate filter orderings to consider for N filters is N!, which increases quickly • Attributes of stream data and arrival characteristics can change over time

  7. Motivation: Adaptive Filter Ordering Challenges • Run-time overhead incurred for continuous monitoring of statistics • Convergence properties required to prove an adaptive algorithm generates an approximation to correct answer • Speed of adaptivity in the face of changing data characteristics • There is a three-way tradeoff: Algorithms that adapt quickly and have good convergence properties incur a lot of run-time overhead

  8. A-GREEDY Algorithm • The cost of a query plan ordering is • ti is the cost of processing one tuple in filter i • d(j|j-1) is the conditional probability that filter j will drop a tuple, given that no filter from 1 to j-1 dropped it

  9. A-GREEDY Algorithm: Static Version and Greedy Invariant • Static greedy algorithm • Choose the filter with highest unconditional drop probability d(i|0) as the first filter • Choose the filter with the highest conditional drop probability d(j|1) as the next filter • Choose the filter with highest conditional drop probability d(k|2) as the next filter • And so on • Greedy Invariant occurs when a filter ordering satisfies the following

  10. A-GREEDY Algorithm: Profiler • Two logical components of A-Greedy: • profiler: continuously collects and maintains statistics about filter selectivities and processing costs • Reoptimizer: detects and corrects violations of the GI in the current filter ordering • Challenge faced by the profiler: n2n-1conditional selectivities for n filters • Solution: profile of recently dropped tuples

  11. A-GREEDY Algorithm: Profiler • Profile: a sliding window of profile tuples • A profile tuple contains n boolean attributes b1,…,bn corresponding to n filters • Dropped tuples are sampled with some probability p • If a tuple e is chosen for profiling, it will be tested by all remaining filters • A new profile tuple e: bi = 1 if Fi drop e, otherwise bi=0

  12. A-GREEDY Algorithm: Reoptimizer • The reoptimizer maintains an ordering O such that O satisfies the GI • How does the reoptimizer use profile to derive estimates of conditional selectivities? • It incrementally maintains a view over the profile window

  13. A-GREEDY Algorithm: Reoptimizer • Matrix: View over the profile window: n×n • V[i, j]: number of tuples in the profile window that were dropped by Ff(j) but not dropped by Ff (1)Ff (2),…, F f (i-1) V[i, j] is proportional to d (j|i-1)

  14. A-GREEDY Algorithm: Run-time Overhead and Adaptivity • Profile-tuple creation: needs additional n-I evaluations for a tuple dropped by Ff(i) . • Profile-window maintenance: insertion and deletion of profile tuples, averages of filter processing times • Matrix-view update: every update would cause access to up to n2/4 entries • Detection and correction of GI violations • Good convergence properties • Fast adaptivity

  15. Variants of A Greedy • Can we sacrifice some of A-Greedy’s convergence properties or adaptivity speed to reduce its run-time overhead? • Three variants of A greedy are proposed

  16. SWEEP Algorithm • Proceeds in stages • During one stage, only checks for GI violations involving the filter at one specific position j • Do not need to maintain entire matrix view: bf (1) ,…,b f (j) • For each profiled tuple, needs to additionally evaluate F f (j) only • By rotating j over 2,…,n, eventually detects and corrects all GI violations • Same convergence properties as A-Greedy • Reduced view and need for additional evaluations less overhead • slower adaptivity: only 1 filter is profiled in each stage

  17. INDEPENDENT Algorithm • Assumes filters are independent we only need to maintain estimates of unconditional selectivities • Lower view maintenance overhead • Fast adaptivity • Convergence: if assumption holds, optimal. Otherwise, can be O(n) times worse than GI orderings

  18. LOCALSWAP Algorithm • Monitors “local” violations only, i.e., violations involving adjacent filters • LocalSwaps detects situations where a swap between adjacent filters • Only need to maintain two diagonals of the view • For each profiled tuple dropped by Ff(i), only needs to additionally evaluate F f(i+1)

  19. Adaptive Ordering of MultiwayJoins Over Streams • MJoins maintain an ordering of {S0, S1,…, Sn-1}-{Si} for each stream Si • New tuples arriving from Si is joined with other stream windows in that order • Two-phase join algorithm • Drop-probing phase: the new tuple is used to probe all other windows in the specified order. If any window drops it, no further processing will be needed for it • output-generation phase: if no window drops the tuple, proceeds as conventional MJoins • Drop probing resembles pipelined filters • A-Greedy and its variants can be used to determine the orderings

  20. Conclusion • A-Greedy has good convergence properties and fast adaptivity, but incurs significant run-time overhead • Three variants of A-Greedy are proposed, each lying at a different points along the tradeoff spectrum among convergence, runtime overhead, and speed of adaptivity

More Related