1 / 58

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window. Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University. Outline of Talk. Introduction. Algorithm. Analysis. Time. Data stream:. For simplicity assume

deva
Télécharger la présentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Deterministic Algorithm forSummarizing Asynchronous Streamsover a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University

  2. Outline of Talk Introduction Algorithm Analysis

  3. Time Data stream: For simplicity assume unit valued elements

  4. Most recent time window of duration Current time Data stream: Goal: Compute the sum of elements with time stamps in time window

  5. Example I: All packets on a network link, maintain the number of different ip sources in the last one hour • Example II: Large database, continuously maintain averages and frequency moments

  6. Synchronous stream • ti: In ascending order • Asynchronous stream • ti: No order guaranteed Data stream:

  7. Why Asynchronous Data Streams? Synchronous stream Asynchronous stream Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control

  8. Processing Requirements: • One pass processing • Small workspace: poly-logarithmic in the size of data • Fast processing time per element • Approximate answers are ok

  9. Our results: A deterministic data aggregation algorithm Time: Space: Relative Error:

  10. Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling

  11. Outline of Talk Introduction Algorithm Analysis

  12. Time Current time Data stream: For simplicity assume unit valued elements

  13. Most recent time window of duration Current time Data stream: Goal: Compute the sum of elements with time stamps in time window

  14. Divide time into periods of duration

  15. sliding window The sliding window may span at most two time periods

  16. sliding window Sum can be written as two sub-sums In two time periods

  17. sliding window Data structure that maintains an estimate of In left time period

  18. Without loss of Generality, Consider data structure in time period

  19. Data structure consists of various levels is an upper bound of the sum in a period

  20. Consider level Bucket at Level Time period Counts up to elements

  21. Stream: Increase counter value

  22. Stream: Increase counter value

  23. Stream: Increase counter value

  24. Stream: Increase counter value

  25. Stream: Split bucket Counter threshold of reached

  26. Stream: New buckets have threshold also

  27. Stream: Increase appropriate bucket

  28. Stream: Increase appropriate bucket

  29. Stream: Increase appropriate bucket

  30. Stream: Split bucket

  31. Stream:

  32. Stream: Increase appropriate bucket

  33. Stream: Split bucket

  34. Stream:

  35. Splitting Tree

  36. Max depth = Leaf buckets of duration 1 are not split any further

  37. Leaf buckets The initial bucket may be split into many buckets

  38. Leaf buckets Due to space limitations we only keep the last buckets

  39. Suppose we want to find the sum of elements in time period

  40. Consider various levels of splitting threshold

  41. First level with a leaf bucket that intersects timeline

  42. Estimate of S: Consider buckets on right of timeline

  43. OR First level with a leaf bucket On right timeline

  44. Outline of Talk Introduction Algorithm Analysis

  45. Suppose that we use level in order to compute the estimate

  46. Stream: Consider splitting threshold level A data element is counted in the appropriate bucket

  47. Stream: We can assume that the element is placed in the respective bucket

  48. Stream: We can assume that when bucket splits the element is placed in an arbitrary child bucket

  49. Stream: If: GOOD! Element counted in correct bucket

  50. Stream: If: BAD! Element counted in wrong bucket

More Related