240 likes | 345 Vues
This paper presents the P+-tree, an advanced indexing structure improving the Pyramid technique for multidimensional point data queries. Addressing limitations of existing methods, the P+-tree supports both window and k-nearest neighbor (kNN) queries effectively across various workloads. The proposed structure enhances performance by dividing data into subspaces based on clustering, applying transformations to optimize data retrieval. Experimental results demonstrate its efficacy in managing different query types, making it suitable for applications in low-dimensional (GIS, medical imaging) and high-dimensional (image and video databases) contexts.
E N D
Making the Pyramid Technique Robust to Query Types and Workloads Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore
Outline • Backgrounds • Existing work and limitations • Our proposal: The P+-tree • Experimental results • Conclusion
Problem & Motivation Problem: Indexing multidimensional point data Applications: • Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans) • High dimension: Image database, Video database, data warehouse
Typical Query Types • Point Query • Window Query [q0min; q0max]; [q1min; q1max]…[qd-1min; qd-1max] • Range Query X(x1 , x2 , … xd-1), r • K-Nearest Neighbor Query (kNN query) X(x1 , x2 , … xd-1), k
Existing work: Four Strategies • Data partitioning: R-tree family • Space partitioning: k-d-tree family • Dimensionality Reduction: mapping • Data Compression: VA-file, IQ-tree
Existing work: Comparison • Low-dimensional space • The R-tree family structures • For high-dimensional space • Window query: the Pyramid tech. , the iMinMax • kNN query: the IQ-tree, the iDistance
Existing work: Limitations • Limited to query types • The Pyramid tech. , the iMinMax: window query • The iDistance, the IQ-tree: kNN query • Limited to certain workloads • The Pyramid tech. : hyper-cube shaped window query, located around center of the data space
Our proposal: the P+-tree • Based on the Pyramid tech. • Support both window and kNN queries • Robust under different workloads
Review of the Pyramid Tech. i: pyramid number hv: height , in the i’th (if i<d) or (i-d)’th (if i>=d) dimension pvv=i+hv
The P+-tree • Divide data space to subspaces • Based on clustering • Divide in the dimension where two clusters differ greatest • Transform the points in each subspace • Transform a subspace to unit hyper-cube, [si min, simax]d ->[0, 1]d, so that the pyramid tech can be applied • Move the cluster center to center of the transformed space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient
Transformation function • A set of d functions, t0 t1… td-1 • Requirements: • ti is a bijection from [si min , si max] to [0,1] • ti is monotonous • ti ( ci ) = 0.5 • In equations: • ti (si min ) = 0 • ti (si max ) = 1 • ti ( ci ) = 0.5
Transformation function • ti(x)=(ai x – bi)^ei i=0, 1, … d-1 • For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max] ai=1/(si min - si max) bi= si min /(si min - si max) ei=-1/log2(ai ci - bi)
The space-tree SNo, ai, bi, ei are stored in leaf nodes
Space division algorithm • Clustering data • Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively) • Build the space-tree
Build the P+-tree • The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys • P+-value: SNo · 2d + pv(v’) • For a newly inserted point v, traverse the space-tree to determine the subspace it belongs to. • Transform the point v to v’, calculate P+-value • Insert the point v, with its P+-value as key
Window search algorithm • Traverse the space-tree to see which subspaces are intersected by the query • For each intersected subspace, transform the query according to the transformation function for the subspace • Search the subspace according to the transformed query
KNN search algorithm • Start from a small window query • Gradually increase the side length of the query window until kNN are found