1 / 61

Approximate Counts and Quantiles over Sliding Windows

Approximate Counts and Quantiles over Sliding Windows. Arvind Arasu, Gurmeet Singh Manku Stanford University. Sliding Window Model. time. 2. 15. 21. 6. 35. 11. 20. 4. 25. 1. 16. 13. 8. 52. 27. 19. 1. Sliding Window Model. time. 2. 15. 21. 6. 35. 11. 20. 4. 25. 1.

irma
Télécharger la présentation

Approximate Counts and Quantiles over Sliding Windows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University PODS

  2. Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 PODS

  3. Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 PODS

  4. Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 SUM = 66 PODS

  5. Sliding Window Model time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 SUM = 59 PODS

  6. Statistics over Sliding Windows • Easy if we store entire window • Storing entire window expensive • Space: “last 1 hour” window @ 1000 elements/sec • Focus of much previous work: Compute approximate statistics using limitedspace PODS

  7. 1 1 1 poly-log ( , N ) є є є Contributions • Algorithms for computing approximate quantiles and approximatefrequency counts over sliding windows • Space requirement: • є = error parameter • N = size of the window • Logarithmic in window size (N) • (Almost) linear in PODS

  8. 1 poly-log ( , N ) є 2 ( ) 1 1 є є Contributions over Previous Work • Frequency counts: First known algorithm for sliding window model • Quantiles: Improves over [ LLXY `04 ] • [LLXY `04] space: • Quadratic in PODS

  9. Rest of the Talk • Formal problem specification • Sliding windows • (Approximate) frequency counts • Our algorithms • Fixed-size sliding windows • Variable-size sliding windows Frequency Counts only, for Quantiles see paper PODS

  10. Sliding Windows • Two abstract window models • Fixed-size sliding windows • Row-based windows • Variable-size sliding windows • Time-based windows, shared windows PODS

  11. Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  12. Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  13. Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  14. Fixed-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  15. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  16. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 6 PODS

  17. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 7 PODS

  18. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 6 PODS

  19. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 5 PODS

  20. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 4 PODS

  21. Variable-Size Sliding Windows time 2 15 21 6 35 11 20 4 25 1 16 13 8 52 27 19 1 Window size (N) = 3 PODS

  22. Frequency Counts 1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 Element Count 1 7 3 6 Select Element, Count(*) From Multiset Group by Element 2 5 7 2 9 1 2 1 4 1 PODS

  23. Approximate Frequency Counts • Elements and their approximate counts • Approximate Count : True Count– є M < Approximate Count≤True Count • Error parameter: є • Size of input: M • Only elements with Approximate Count > 0 References: [MG ’82, DLM ’02, MM ’02, KSP ’03] PODS

  24. 1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 7 6 2 1 2 1 1 Approximate Frequency Counts Input Size: M = 20 Element Approx. Count True Count Error 1 3 4 Error parameter: є= 0.25 3 4 2 Absolute error: єM = 5 7 1 1 9 1 0 5 0 2 2 0 1 4 0 1 PODS

  25. 1 5 3 1 2 1 3 7 1 4 3 1 5 3 3 1 3 9 7 1 7 6 2 1 2 1 1 Approximate Frequency Counts Input Size: M = 20 Element Approx. Count True Count Error 1 3 4 Error parameter: є= 0.25 3 4 2 Absolute error: єM = 5 7 0 2 9 0 1 5 0 0 2 0 1 4 0 1 PODS

  26. 1 1 1 є є є Approximate Frequency Counts • All elements with frequency ≥ єM appear in the output. • There exists an output with ≤ elements. • Theorem: An approximate frequency count ofsize O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] PODS

  27. Rest of the Talk • Formal problem specification • Sliding windows • (Approximate) frequency counts • Our algorithms • Fixed-size sliding windows • Variable-size sliding windows Frequency Counts only, for Quantiles see paper PODS

  28. Fixed-Size Sliding Windows • Window Size: N • Error parameter: є • Absolute error: є N PODS

  29. N Overview PODS

  30. N Overview PODS

  31. N Overview PODS

  32. N Overview PODS

  33. N Overview PODS

  34. N Overview PODS

  35. N Overview PODS

  36. N Overview PODS

  37. N Overview PODS

  38. log ( ) 1 є N єN є є є 1 0 2 4 Details = O(єN) PODS

  39. log ( ) N N 1 i i є є є i i Error Invariant Absolute error of all blocks identical єN = Error parameter for block Number of elements in block PODS

  40. N Merge Operation PODS

  41. ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ є є N N є є - - N N < < 2 1 2 1 1 2 2 1 f f f f f f f f f f f f f f f f f f f f 2 1 1 2 2 1 1 1 1 2 1 2 2 2 1 2 1 1 2 2 ( ( ( ) ) ) + + + + Merge Operation Add approximate counts of elements. Absolute error adds up. Block 1 Block 2 Block1 + Block2 True count + Approx. count + ≤ ≤ - < ≤

  42. єN ( ) 1 log ( ) O 1 ( ) є log є N Error Analysis O(єN) + + O(єN) PODS

  43. log ( ) 1 є N єN є є є 1 0 2 4 Space Requirement PODS

  44. 1 1 1 є є є Approximate Frequency Counts • All elements with frequency ≥ єM appear in the output. • There exists an output with ≤ elements. • Theorem: An approximate frequency count ofsize O( ) can be produced in one pass over the input using O( ) space. References: [MG ’82, DLM ’02, KSP ’03] PODS

  45. 1 1 1 1 1 1 1 є є є є є є є 2 ( ( ( ) ) ) ( ) x Total space : log log log = log Space Requirement Space required for level-ℓblocks: N N 1 = = x є N єN / ( 1 ) log ℓ ℓ є Size of approx. count Number of “active” blocks PODS

  46. 1 1 є є Fixed-Size Sliding Windows: Summary Theorem: є-approximate frequency counts can be maintained over a fixed-size sliding window of size N using space. 2 ( ) log PODS

  47. Variable-Size Windows • Error parameter: є • Variable window size: n • Variable absolute error: єn PODS

  48. log ( ) 1 є N єN є є є 1 0 2 4 Fixed-Size Window Algorithm? PODS

  49. Fixed-Size Window Algorithm? n N єN error parameter = n F (є, N) PODS

  50. Limited Variability • F(є/2, N) computes є-approximate frequency counts for window sizes (N/2 ≤ n ≤ N). PODS

More Related