Sampling Time-Based Sliding Windows in Bounded Space
Sampling Time-Based Sliding Windows in Bounded Space. Rainer Gemulla and Wolfgang Lehner SIGMOD 2008. Outline. Motivation Priority sampling Bounded priority sampling Correctness and analysis Sampling multiple items Experimental results Conclusion. Motivation.
Sampling Time-Based Sliding Windows in Bounded Space
E N D
Presentation Transcript
Sampling Time-Based Sliding Windows in Bounded Space Rainer Gemulla and Wolfgang Lehner SIGMOD 2008 Chen Yi-Chun
Outline • Motivation • Priority sampling • Bounded priority sampling • Correctness and analysis • Sampling multiple items • Experimental results • Conclusion Chen Yi-Chun
Motivation • Random sampling is an appealing approach to build synopses of large data streams. • In this paper, author is concerned with sampling schemes that maintain a uniform sample of a time-based sliding window in bounded space. • Main challenge is to guarantee an upper bound on the space consumption of the sample. Chen Yi-Chun
Notation definition • : the set of items from R with a timestamp smaller than or equal to t • : a sliding window of length • : the size of the window at time t • Window length : the timespan covered by the window ( ,fixed) • Window size : the number of items in the window (N(t),varying) • S(t) : uniform random sample Chen Yi-Chun
Priority sampling • The replacement set is the reason for the unbounded space consumption of the sampling scheme. Chen Yi-Chun
Bounded priority sampling • a) Arrival of item • becomes the new candidate item • There is currently no candidate item • The priority of is larger than priority of the candidate item • b) Expiration of candidate item : becomes test item • c) Double expiration of test item : discard Chen Yi-Chun
Correctness and analysis p’ pmax emax e’ Chen Yi-Chun
Cont. In the worst case, e’ equals the highest-priority item in W(t- ) p’ pmax emax e’ Chen Yi-Chun
Sampling Multiple Items • BPSWOR(BPS without-replacement): • Modify BPS so as to store k candidates and k test items simultaneously. p1 p2 e2 e1 |Scand|< k Chen Yi-Chun
Each item of the data stream consists of a 8-byte timestamp and 32 bytes of dummy data Experimental results • A space budget of 32 kbytes • At most 819 items can be stored in 32 kbytes space Chen Yi-Chun
Conclusion • It has studied bounded –space techniques for maintaining uniform samples over a time-based sliding window of a data stream. Chen Yi-Chun