1 / 17

Computer Science and Engineering

Computer Science and Engineering. Efficiently Monitoring Top-k Pairs over Sliding Windows. Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema 1 , Xuemin Lin 21 , Wenjie Zhang 1 , Haixun Wang 3. 1 The University of New South Wales, Australia

tassos
Télécharger la présentation

Computer Science and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen1 Joint work with Muhammad AamirCheema1,Xuemin Lin21, Wenjie Zhang1, HaixunWang3 1The University of New South Wales, Australia 2 East China Normal University 3 Microsoft Research Asia

  2. Introduction Top-k Pairs Query: Given a scoring function score() that computes the score of a pair of objects, return k pairs of objects with the smallest scores. Examples: k closest pairs queries k furthest pairs queries Top-k Pairs against sliding windows Given a data stream, return top-k pairs among the most recent N objects. Applications Wireless sensor network, stock market, traffic monitoring and transaction monitoring

  3. Motivation No existing work for general pairs queries over sliding windows Support arbitrary scoring functions. Example: Fraud detection over transaction streams • Query the transaction pairs that have small time difference but the locations are far away. Select a.id, b.id from trans a, trans b where a.id <> b.id and a.account = b.account order by |a.time - b.time| - dist(a.loc, b.loc) limit k window [24 hours]

  4. Problem Definitions (Preliminaries) Sliding Windows • A sliding window contains most recent N objects of the data stream. • The number of pairs is N(N – 1) / 2 The age of a pair depends on the older object. o0 o4 o7 o5 o6 o2 o1 o3 . . . . . older newer Age of an object: 5 4 3 2 1 0 Sliding window of size 5 Lower bound runtime cost : O(N) for each new object Lower bound storage cost : O(N)

  5. Contributions Unified framework First to study top-k pairs queries over sliding windows. Support arbitrarily complex scoring functions Support efficient queries for any window size n ≤ N and any k ≤ K

  6. Preliminaries Map all the pairs to an age–score space Top-2 pairs p1(o0, o1) (p1.age, p1.score) (1, 3) Score Task1 : how we efficiently maintain the K-skyband p9 p2 dominates p5 because p2.score < p5.score and p2 expires no later than p5. p3 p6 p10 p1 Task2 : how we use the K-skyband to efficiently obtain top-k pairs against any sliding window n ≤ N Expected size of skyband is O(K log(N/K)) p5 o0 o2 o1 o4 o3 p8 p2 Naive: O(N |SKB|) for checking all N-1 pairs p7 p4 Our: O(N log|SKB|) 1 2 3 4 Age K-skyband[Papadias et al., TODS05] keeps the minimum set for the candidate results.

  7. Efficient Skyband Maintenance p5 s1 s2 p1 Can we find a boundary between the skyband points and non-skyband points? K-staircase How can we efficiently compute the K-staircase and K-skyband? K-staircase s2 Update the K-staircase and K-skyband in O(|SKB| log K)), p6 p7 Score p1 s1 Check if a pair is dominated by K-skyband in O(log |SKB|) time for each new pair by doing binary search. p2 p3 p5 p4 Age 2-skyband

  8. Efficient Query Answering Can we do better for any sliding window size n < N? Use Priority Search Tree to index the skyband points Any window size = n < N Window size = N Score p2 p3 p1 p4 p6 p7 p8 p6 p5 p1 p5 p3 9 6 5 8 1 2 4 3 p2 p4 p8 p7 Priority Search Tree Self-balancing tree Efficient 3-sides range query Age 2-skyband

  9. Efficient Query Answering Our contribution: Retrieve top-k pairs in the 1-sided range. An algorithm similar to post-order traversal costs O(log|SKB| + k) Any window size = n < N Score p2 p3 p1 p4 p6 p7 p8 p6 p5 p1 p5 p3 9 6 5 8 1 2 4 3 p2 p4 p8 p7 Priority Search Tree Age 2-skyband

  10. What else in the paper? Efficient continuous queries on the skyband. Continuously monitoring the top-k results for any fixed k (k ≤ K) and n (n ≤ N). Amortized O(k/n (log |SKB| + k)) time per update. Optimization on monotonic scoring functions. Handling the k-closest pairs, k-furthest pairs queries. Applying Threshold Algorithm on sorted lists Improving the number of considered pairs for each new object from N to (d+1) N d/(d+1) K 1/(d+1)

  11. Experimental Settings Real dataset. • Sensor data in the Intel research lab • 2.3 million records. Synthetic data. • Uniform, correlated and anti-correlated distributions. • 2 million objects • Closest and furthest pairs in Manhattan distance

  12. Experiments (Overall Cost on real data) SCase: our algorithm using K-staircase to maintain the skyband. Naïve: maintains kN pairs and sort them on their scores. LB: shows lower bound cost Varying N (in thousands) Varying K

  13. Experiments (Query Answering) Linear: scan the skyband points to find the top-k pairs. Snapshot: our snapshot query algorithm. Continuous: our continuous query algorithm. LB: an algorithm to obtain top-k results in O(k) time. Varying |Q| (in thousands) Varying K

  14. Conclusion: First to study a broad class of top-k pairs queries over sliding windows. We present efficient algorithms and show that the performance of our algorithm is reasonably close to the lower bound cost. We provide extensive experiment results on both real and synthetic data sets to show the efficiency and scalability of the proposed algorithms.

  15. Question and Answer Thank You! Any Questions?

  16. Related Work Top-k Query Processing Fagin’s Algorithm (FA), threshold Algorithm (TA), no-random access (NRA) Top-k Pairs Queries Processing k-closest pairs queries k-furthest pairs queries Top-k pairs queries [Cheema et al., ICDE’11] Data Stream Processing Top-k query processing over data stream [Mouratidis et al., SIGMOD’06] k-nearest neighbour queries [Böhm et al., ICDE’07]

  17. Experiments (Skyband Maintenance algorithm) Basic: maintening algorithm without K-staircase SCase: our algorithm using K-staircase to maintain the skyband. TA: Optimized algorithm for monotonic scoring functions. LB: show lower bound cost Varying K # of attributes

More Related