1 / 23

Maintaining Time-Decaying Stream Aggregates

Maintaining Time-Decaying Stream Aggregates. Edith Cohen Martin Strauss AT&T Labs-research. The Problem. A data stream is a sequence of data items observed over time. Presence of multiple massive data streams.

edorsey
Télécharger la présentation

Maintaining Time-Decaying Stream Aggregates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maintaining Time-Decaying Stream Aggregates Edith Cohen Martin Strauss AT&T Labs-research PODS 2003

  2. The Problem • A data stream is a sequence of data items observed over time. • Presence of multiple massive data streams. • Storage constraints allow only to maintain a compact summary of the “essence” of information in each stream. • Relevance of information decays with time. • Thus, when aggregating across time, older information should be discounted. PODS 2003

  3. Applications • IP routing - RED protocol: time-decayed average of previous queue lengths is used to estimate impending congestion at router • Internet gateway selection: tracks the quality (eg packet loss rate) of alternative paths to select a more reliable one. • Usage statistics of phone customers: AT&T has about 100M customers. • More ….. PODS 2003

  4. Decay Functions • A decay function is non-increasing g(x)>=0 defined for x>=1. • f(t) >=0 is the value of the data item observed at time t. • The weight at time T of an item obtained at time t is g(T-t) • The decayed value of the item is f(t)g(T-t) PODS 2003

  5. Time-Decaying Sum • When f(t) are 0/1 we refer to the problem as time-decaying count. • Maintaining the decaying sum exactly can generally consume linear bits. • We consider approximately maintaining it to within PODS 2003

  6. Maintaining time-decaying average reduces to maintaining two time-decaying sums Time-Decaying Average • Time-decaying weighted average of observed values. • is the value of item observed at time PODS 2003

  7. Exponential decay [Jacobson 88] • Sliding Windows [DGIM02] • g(x)=1 for x<W • g(x)=0 otherwise • Polynomial decay Interesting Families of Decay Functions • General Decay functions… PODS 2003

  8. Lemma: • Exact tracking requires storage bits • Approximate tracking uses bits Exponential Decay • Used in networking applications (RED) • Very simple maintenance: PODS 2003

  9. Sliding Window Decay Lemma: [DGIM02] Sliding window decay can be approximately tracked using bits (for 0/1 or poly size values). • “Sharp Threshold” • Upper bound using the Exponential Histogram (EH) technique. PODS 2003

  10. Polynomial Decay Lemma: Lower bound: Upper bound: (N is elapsed time) • Often more appropriate to applications than Exponential or Sliding Window decay • More efficient than SliWin decay (nearly quadratic gap), almost as efficient as Exponential decay. PODS 2003

  11. Algorithm based on an adaptation of the Exponential Histograms technique. • Sliding windows, (with ), [DGIM02] are as “hard” to maintain as general decay General Decay Functions • Lemma: Can be (approximately) maintained using bits (N is minimum of elapsed time and min x for which g(x)=0 ) PODS 2003

  12. Time t0 good Which link should we select past time t0? bad Initially A or B, eventually B. Why Polynomial Decay? • Link performance over time Link A Link B PODS 2003

  13. Poly decay can model our expectation (also other smooth subexponential functions…) Link Selection Example) cont) • Polynomial decay (by tuning parameter): Initially A or B, eventually B. • Exponential decay: Constant relative value of A and B: Either A forever or B forever • Sliding Window decay: First B then A then same… PODS 2003

  14. Approximate to within Summary of Bounds • N is minimum of elapsed time and min x for • which g(x)=0 PODS 2003

  15. Time Time width: 4 Count: 2 Time width: 3 Count: 2 Time width: 3 Count: 1 Time width: 7 Count: 4 Merge Bucketing the Stream 1 0 0 1 1 0 1 0 0 1 • Histogram determined by time boundaries and bucket counts • Time boundaries can be fixed (counts maintained per stream) • Counts can be fixed (time boundaries maintained per stream) PODS 2003

  16. Bucket counts are independent of stream • Sum of bucket counts is a constant-factor approximation for Exponential Histograms [DGIM02] • Introduced for Sliding Windows • Each new item is placed in a new bucket. • Two buckets are merged when their combined count is at most a fraction of the combined count of all earlier buckets. • Buckets with start time greater than W are discarded. PODS 2003

  17. Exponential Histograms (cont) • Example for factor 2 approximation: (bucket counts) • 1 • 1, 1 • 1, 1, 1 • 1, 1, 2 (merge) • 1, 1, 1, 2 • 1, 1, 2, 2 (merge) • Values with time “in question” (before or after W) are aggregated in least recent bucket. PODS 2003

  18. EHs properties • Number of buckets is O(log W), for each bucket we need to record exact start time, thus we need O(log W) storage per bucket. (total is O(log^2 W)) • An EH for Sliding Window W can be used to approximate Sliding Window j for all j<W Lemma: EH can be used to approximate general decay functions. (With W= minimum of elapsed time and min x for which g(x)=0.) PODS 2003

  19. With an EH with W=N we can compute (approximately) decayed sums according to all decay functions g() up to elapsed time N (or forever if g(N)=0). From (approximate) for all W<=N we can compute (approximate) decayed sum according to g(). Reducing any Decay Function to Sliding Windows. • Decay function g(x) PODS 2003

  20. O(log N log log N) storage for polynomial decay Weight-Based Merging • Bucket start times depend only on elapsed time. • WBM Histograms applies to decay functions where g(x)/g(x+1) is non-increasing. • Number of buckets is O(log(g(1)/g(N))). • O(log log N) storage per bucket (for approximate bucket counts). • More efficient than EH on decay that is slightly super-polynomial or slower. PODS 2003

  21. At most 2 buckets per region WBM Histograms – How? • Region boundariesb1,b2,b3,… : • Current most-recent bucket is sealed and new bucket is started at T s.t. T mod b1=0 • Two consecutive buckets that are in the same region (according to elapsed start and end times) are merged. PODS 2003

  22. T=1 T=2 T=3 T=4 T=5 T=6 WBMH Example • g(x)=1/x, (1+e)=2 • Regions: 1,1/2, 1/3,1/4,1/5,1/6, 1/7,1/8,…,1/14 PODS 2003

  23. Conclusion • Summary: • Efficient computation of time-decayed sum/averages for general decay functions. • Very efficient computation for polynomial decay • Open question: • O(log n) storage for polynomial decay • Subsequent related work: • Spatial decay (sensor nets/p2p nets) PODS 2003

More Related