html5-img
1 / 18

QED : An Efficient Framework for Temporal Region Query Processing

QED : An Efficient Framework for Temporal Region Query Processing. Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University. Introduction. Dense Region Query

Télécharger la présentation

QED : An Efficient Framework for Temporal Region Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QED: An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National Taiwan University

  2. Introduction • Dense Region Query • Data records are viewed as data points in the d-dimensional data space constructed by the d-attributes. • Locate the regions with higher density than their surroundings. Salary (*1000) Dense region Age

  3. Grid-based Approach • The data space is divided into non-overlapping rectangular grids (cells). • Density of a cell: the percentage of data points contained in this cell Salary (*1000) Dense cell Maximal connected dense cells Dense region Age 0 10 20 30 40 50 60 70 80 90 100

  4. Motivation • Previous research tends to ignore the time feature of the data. They execute queries over the entire database. • However, different dense regions may be discovered if different time periods are taken into consideration. (the density of a cell: ) • Discovering dense regions over different time intervals is crucial for users to get the interesting patterns hidden in data.

  5. Example • Some dense regions may exist in certain time intervals but will not be discovered if taking all data records into account. • Middle-aged people: <A>: the number of customers in different time slots <B>: the number of middle-aged people in different time slots

  6. Temporal Dense Region Query • Dense Region Discovery in the constrained time intervals. • E.g., each Sunday in May, • Time slots: • Derived by segmenting the data points with a time granularity, e.g. hour, week, month, etc. • For users to specify a variety of time periods of interest • Problem Definition: • Given a set of time slots, and the density threshold ρ, find the dense regions in the queried time slots.

  7. QED Framework • Challenge • The queried time intervals are unknown in advance. • QED (Querying tEmporal Dense region) • Offline Maintaining Phase • Construct a summarized data structure, RF-tree, for each time slot • Online Clustering Phase • Answer various user queries based on the RF-trees

  8. Temporal Dense Region Query W1 Combine Query Result W2 W3 Online query processing phase Offline maintaining phase QED Framework

  9. Offline Maintaining Phase- Construct the RF-trees • Basic Idea: • A number of cells having nearly the density value can be summarized by their average density value. • Uniform Region • A region where the cells contained in it have nearly of the same density value region

  10. Uniform Region • Entropy-based approach • Entropy of a region • Maximum entropy of a region • Uniform region

  11. Example (Uniform Region) • Case 1: • Case 2: Region A Region A

  12. Construct the RF-tree • Recursively partition the data space to find the uniform region • The leaf nodes will be of two cases: • A cell • A uniform region • RF (Region Feature):

  13. Online Query Processing Phase • Step1: Combine the RF-trees of the queried time slots. • Step2: Execute the query on the combined RF-tree.

  14. Step1: Combine the RF-trees • Three cases for combining the corresponding regions in two RF-trees. • Case 1 : Both are uniform regions • Case 2 : Both are non-uniform regions • Case 3 : Only one is a uniform region

  15. Step2: Execute the query • All leaf nodes in the combined RF-trees are examined to discover the dense cells in the data space. • The leaf nodes will be of two cases: • A cell • A uniform region: compare the average density with the density thresholdρ • The leaf nodes containing dense cells will be put into a queue for further dense region discovery.

  16. Conclusion • The problem of temporal dense region query is explored to discover dense regions in the queried time slots. • We also propose the QED framework to execute temporal dense region queries. • QED is advantageous in that various queries with different density thresholds and time slots can be efficiently supported by using the concept of time slot and proposed RF-tree.

  17. References • Yi-Hong Chu, Kun-Ta Chuang, Ming-Syan Chen, QED: an Efficient Framework for Temporal Dense Region Processing, in Proc. of PAKDD, 2005. • W. Wang, J. Yang, and R. Muntz1997, STING: A Statistical Information Grid Approach to Spatial Data Mining, in Proc. of VLDB, 1997. • D,-S. Cho, B-H.Hong, and J.Max. Efficient Region Query Processing by Optimal Page Ordering. In Proc. of ADBIS-DASFAA, 2000.

  18. Thank You~ Q & A

More Related