610 likes | 840 Vues
Approximate Counts and Quantiles over Sliding Windows. Arvind Arasu, Gurmeet Singh Manku Stanford University. Sliding Window Model. time. 2. 15. 21. 6. 35. 11. 20. 4. 25. 1. 16. 13. 8. 52. 27. 19. 1. Sliding Window Model. time. 2. 15. 21. 6. 35. 11. 20. 4. 25. 1.
E N D
Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University PODS
Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 PODS
Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 PODS
Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 SUM = 66 PODS
Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 SUM = 59 PODS
Statistics over Sliding Windows • Easy if we store entire window • Storing entire window expensive • Space: “last 1 hour” window @ 1000 elements/sec • Focus of much previous work: Compute approximate statistics using limitedspace PODS
1 1 1 poly-log ( , N ) є є є Contributions • Algorithms for computing approximate quantiles and approximatefrequency counts over sliding windows • Space requirement: • є = error parameter • N = size of the window • Logarithmic in window size (N) • (Almost) linear in PODS
1 poly-log ( , N ) є 2 ( ) 1 1 є є Contributions over Previous Work • Frequency counts: First known algorithm for sliding window model • Quantiles: Improves over [ LLXY `04 ] • [LLXY `04] space: • Quadratic in PODS
Rest of the Talk • Formal problem specification • Sliding windows • (Approximate) frequency counts • Our algorithms • Fixed-size sliding windows • Variable-size sliding windows Frequency Counts only, for Quantiles see paper PODS
Sliding Windows • Two abstract window models • Fixed-size sliding windows • Row-based windows • Variable-size sliding windows • Time-based windows, shared windows PODS
Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 6 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 7 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 6 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 4 PODS
Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 3 PODS
Frequency Counts 1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 Element Count 1 7 3 6 Select Element, Count(*) From Multiset Group by Element 2 5 7 2 9 1 2 1 4 1 PODS
Approximate Frequency Counts • Elements and their approximate counts • Approximate Count : True Count– є M < Approximate Count≤True Count • Error parameter: є • Size of input: M • Only elements with Approximate Count > 0 References: [MG ’82, DLM ’02, MM ’02, KSP ’03] PODS
1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 7 6 2 1 2 1 1 Approximate Frequency Counts Input Size: M = 20 Element Approx. Count True Count Error 1 3 4 Error parameter: є= 0.25 3 4 2 Absolute error: єM = 5 7 1 1 9 1 0 5 0 2 2 0 1 4 0 1 PODS
1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 7 6 2 1 2 1 1 Approximate Frequency Counts Input Size: M = 20 Element Approx. Count True Count Error 1 3 4 Error parameter: є= 0.25 3 4 2 Absolute error: єM = 5 7 0 2 9 0 1 5 0 0 2 0 1 4 0 1 PODS
1 1 1 є є є Approximate Frequency Counts • All elements with frequency ≥ єM appear in the output. • There exists an output with ≤ elements. • Theorem: An approximate frequency count ofsize O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] PODS
Rest of the Talk • Formal problem specification • Sliding windows • (Approximate) frequency counts • Our algorithms • Fixed-size sliding windows • Variable-size sliding windows Frequency Counts only, for Quantiles see paper PODS
Fixed-Size Sliding Windows • Window Size: N • Error parameter: є • Absolute error: є N PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
N Overview PODS
log ( ) 1 є N єN є є є 1 0 2 4 Details = O(єN) PODS
log ( ) N N 1 i i є є є i i Error Invariant Absolute error of all blocks identical єN = Error parameter for block Number of elements in block PODS
N Merge Operation PODS
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ є є N N є є - - N N < < 2 1 2 1 1 2 2 1 f f f f f f f f f f f f f f f f f f f f 2 1 1 2 2 1 1 1 1 2 1 2 2 2 1 2 1 1 2 2 ( ( ( ) ) ) + + + + Merge Operation Add approximate counts of elements. Absolute error adds up. Block 1 Block 2 Block1 + Block2 True count + Approx. count + ≤ ≤ - < ≤
єN ( ) 1 log ( ) O 1 ( ) є log є N Error Analysis O(єN) + + O(єN) PODS
log ( ) 1 є N єN є є є 1 0 2 4 Space Requirement PODS
1 1 1 є є є Approximate Frequency Counts • All elements with frequency ≥ єM appear in the output. • There exists an output with ≤ elements. • Theorem: An approximate frequency count ofsize O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] PODS
1 1 1 1 1 1 1 є є є є є є є 2 ( ( ( ) ) ) ( ) x Total space : log log log = log Space Requirement Space required for level-ℓblocks: N N 1 = = x є N єN / ( 1 ) log ℓ ℓ є Size of approx. count Number of “active” blocks PODS
1 1 є є Fixed-Size Sliding Windows: Summary Theorem: є-approximate frequency counts can be maintained over a fixed-size sliding window of size N using space. 2 ( ) log PODS
Variable-Size Windows • Error parameter: є • Variable window size: n • Variable absolute error: єn PODS
log ( ) 1 є N єN є є є 1 0 2 4 Fixed-Size Window Algorithm? PODS
Fixed-Size Window Algorithm? n N єN error parameter = n F (є, N) PODS
Limited Variability • F(є/2, N) computes є-approximate frequency counts for window sizes (N/2 ≤ n ≤ N). PODS