1 / 30

Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream. Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu ICDE2004. Outline. motivation Problem definition Quantile Sketch Sliding window model n of N model Conclusion. Motivation.

hogan
Télécharger la présentation

Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu ICDE2004

  2. Outline • motivation • Problem definition • Quantile Sketch • Sliding window model • n of N model • Conclusion

  3. Motivation • Data elements seen early could be outdated and quantile summaries for the most recently seen data elements are more important. • Example: • The top ranked Web pages among most recently assessed N pages should produce more accurate webpages accessed so far as users’ interests are changing.

  4. Problem Definitions • -Quantile:A -quantile ((0,1]) of an ordered sequence of N data elements is the element with rank N . • Quantile Query: Given , find the data element with rank N among all elements in the stream. • Variation: N recent elements (sliding window model).

  5. N = 16 sort 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 11, 11, 11, 12 0.5 quantile returns element ranked 8 ( 0.5*16) which is 8 0.75 quantile returns element ranked 12 (0.75*16) which is 10

  6. Three Different Models • Data stream model • Computing ψ-quantile for all data items seen so far 0.5-quantile returns 10 at time t11 0.5-quantile returns 8 at time t15

  7. Three Different Models (contd.) • Sliding window model • Computing ψ-quantile against the N most recent elements in a data stream seen so far Window size = 12 , 0.5-quantile returns 10 at time t11 0.5-quantile returns 6 at time t15

  8. Three Different Models (contd.) • n-of-N model • For any n ≦ N, computing ψ-quantile among the n most recent elements in a data stream seen so far N = 12, 0.5-quantile returns 8 at time t11 for n = 8, 0.5-quantile returns 3 at time t15 for n = 4

  9. ε- approximate • A quantile summary for a data sequence is ε- approximate if, for any given rank r, it returns a value whose rank r’ is guaranteed to be within the interval [r -εN , r + εN ] • 0.25-approximate 0.5-quantile returns one of the elements in {4,5,6,7,8,9,10}. Example : A data stream with 100 elements, 0.5 – quantile with ε= 0.1 returns a value v. The true rank of v is within [40,60]

  10. Quantile Sketch • Data structure • { (vi , ri– ,ri+) : 1 ≦ i ≦ m} • A value vi is one of the element seen so far • ri–is the lower bound on the rank of vi • ri+is the upper bound on the rank of vi • vi ≦ vi+1 , for1 ≦ i ≦ m - 1 • ri– ≦ ri+1– , for 1 ≦ i ≦ m – 1 • ri– ≦ ri ≦ ri+, where riis the rank of vi

  11. The Summary Data Structure • Given gi = ri–- ri-1–and Δi = ri+- ri– • ri–= ji gj • ri+= ji gj +Δi • v1 and vm always correspond to the minimum and the maximum elements seen so far.

  12. Example?? Quantile sketch consisting of 6 tuples {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)}

  13. ε- approximate sketch • Theorem • 1. r1+≦εN + 1, • 2.rm–≧ (1-ε)N, • 3. for 2≦ i ≦ m, • Sketch S isε- approximate, That is for each ψ (0,1] , there is a (vi , ri– ,ri+) in S such that

  14. Query Quantile sketch consisting of 6 tuples ε= 0.25 {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)} 0.5 – quantile return the viof rank 8 , εN = 4 Find the first tuple to satisfy the rule, and return vi (5,4,10) => return 5

  15. Dilemma • Memory is bounded • GK-algorithm - space requirement

  16. One-Pass summary for sliding windows • Continuously divide a stream into the buckets based on the arrival ordering of data elements • The capacity of each bucket is • For each bucket, we maintain an -approximate continuously by GK-algorithm • Once a bucket is full its - approximate sketch is compressed into an - approximate sketch • The oldest bucket is expired if currently the total number of elements is N+1

  17. the most recent N elements Current bucket …. expired bucket GK Compressed - approximate sketch in each bucket

  18. Current bucket Current bucket Current bucket -approximate sketch -approximate sketch -approximate sketch -approximate sketch Expire Example N = 8 , ε= 1 , = 4 1 2 3 4 5 6 7 8 9 Full , compress

  19. Compress • Compress an - approximate sketch intoε- approximate sketch • Memory space is most

  20. Merge • There are h data stream Di,and each Dihas Ni data elements. Suppose each Si is an ε- approximate sketch of Di. • Smerge is a sketch of • |Smerge| = • Suppose each Si is an ε- approximate sketch. Then, Smerge is also an ε- approximate sketch

  21. 1, 2, 3, 4, 5, 6, 7, 8, 9 Current Expired ε=1 and N = 8 Another Problem Approximate sketch The first tuple inSmerge is , but the rank of 5 is 4. Smerge is not an - approximate sketch

  22. Lift • To solve the pervious problem, we use a “lift” operation to lift the value of by for each tuple i • If S is an - approximate sketch, then Slift is an ε -approximatesketch

  23. Smerge Query Step1. merge the local sketch Current bucket Step2. lift Smerge lift Slift Step3. for a given rank r = ,find the first tuple in Slift such that , return vi

  24. One-Pass Summary under n-of N • EH partitioning Technique • EH maintains at most +1 “i-buckets” for each i e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e4 e1 e2 e3 For N elements, the number of buckets in EH is always e6 e1 e2 e3 e4 e5 1-bucket = 4 , merge 1-bucket 2-bucket = 4 , merge 2-bucket e1 e2 e3 e4 e5 e6 e7 e8 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10

  25. Sketch Construction • Use the EH technique to partition a data stream • Maintain a sketch Sbfor each bucket b • Choose λ= • Maintain an approximate sketch for each Sb

  26. 2-bucket 1-bucket Example • Construct a sketch Sbfor each bucket b to summarize the data element from the earliest element in b up to now 4-bucket 2-bucket 1-bucket 4-bucket 2-bucket 1-bucket f e d c b a f e d c b a g Sf Sf Se Se Sd Sd Sc Sc λ= 1/2 Sb Sb Sa Sa Sg

  27. n-of-N Query 4-bucket 2-bucket 1-bucket Step1. f e d c b a Sf Se Sd Sc Sb Sa n

  28. n-of-N Query n Step2. Se Lift by Slift Step3. for a given rank r, find the first tuple in Slift such that , return vi

  29. Conclusions • The work presented is among the attempts to develop space efficient, one pass, deterministic quantile summary algorithms with performance guarantees under the sliding windowmodel of data streams

More Related