Enhancing P+-Tree for Diverse Query Types and Workloads in Data Indexing

Making the Pyramid Technique Robust to Query Types and Workloads Rui Zhang, Beng Chin Ooi, Kian-Lee Tan Department of Computer Science National University of Singapore Singapore

Outline • Backgrounds • Existing work and limitations • Our proposal: The P+-tree • Experimental results • Conclusion

Problem & Motivation Problem: Indexing multidimensional point data Applications: • Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans) • High dimension: Image database, Video database, data warehouse

Typical Query Types • Point Query • Window Query [q0min; q0max]; [q1min; q1max]…[qd-1min; qd-1max] • Range Query X(x1 , x2 , … xd-1), r • K-Nearest Neighbor Query (kNN query) X(x1 , x2 , … xd-1), k

Existing work: Four Strategies • Data partitioning: R-tree family • Space partitioning: k-d-tree family • Dimensionality Reduction: mapping • Data Compression: VA-file, IQ-tree

Existing work: Comparison • Low-dimensional space • The R-tree family structures • For high-dimensional space • Window query: the Pyramid tech. , the iMinMax • kNN query: the IQ-tree, the iDistance

Existing work: Limitations • Limited to query types • The Pyramid tech. , the iMinMax: window query • The iDistance, the IQ-tree: kNN query • Limited to certain workloads • The Pyramid tech. : hyper-cube shaped window query, located around center of the data space

Our proposal: the P+-tree • Based on the Pyramid tech. • Support both window and kNN queries • Robust under different workloads

Review of the Pyramid Tech. i: pyramid number hv: height , in the i’th (if i<d) or (i-d)’th (if i>=d) dimension pvv=i+hv

Sensitivity to location of query window / data distribution

Sensitivity to shape of query

The P+-tree • Divide data space to subspaces • Based on clustering • Divide in the dimension where two clusters differ greatest • Transform the points in each subspace • Transform a subspace to unit hyper-cube, [si min, simax]d ->[0, 1]d, so that the pyramid tech can be applied • Move the cluster center to center of the transformed space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient

Space division and data transformation

Transformation function • A set of d functions, t0 t1… td-1 • Requirements: • ti is a bijection from [si min , si max] to [0,1] • ti is monotonous • ti ( ci ) = 0.5 • In equations: • ti (si min ) = 0 • ti (si max ) = 1 • ti ( ci ) = 0.5

Transformation function • ti(x)=(ai x – bi)^ei i=0, 1, … d-1 • For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max] ai=1/(si min - si max) bi= si min /(si min - si max) ei=-1/log2(ai ci - bi)

The space-tree SNo, ai, bi, ei are stored in leaf nodes

Space division algorithm • Clustering data • Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively) • Build the space-tree

Build the P+-tree • The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys • P+-value: SNo · 2d + pv(v’) • For a newly inserted point v, traverse the space-tree to determine the subspace it belongs to. • Transform the point v to v’, calculate P+-value • Insert the point v, with its P+-value as key

Window search algorithm • Traverse the space-tree to see which subspaces are intersected by the query • For each intersected subspace, transform the query according to the transformation function for the subspace • Search the subspace according to the transformed query

KNN search algorithm • Start from a small window query • Gradually increase the side length of the query window until kNN are found

Experiments: Window Queries

Experiments: Partial Window Queries

Experiments: kNN Queries

Enhancing P+-Tree for Diverse Query Types and Workloads in Data Indexing

Enhancing P+-Tree for Diverse Query Types and Workloads in Data Indexing

Presentation Transcript

Robust Query Processing through Progressive Optimization

Robust Temporal and Spectral Modeling for Query By Melody

CPE 619 Workloads: Types, Selection, Characterization

Robust Query Processing through Progressive Optimization

Making Action Recognition Robust to Occlusions and Viewpoint Changes

The Pyramid

How robust is your spatial query ?

Determining query types

Bioinformatics Applications and Workloads

RaMSiS WP3: Robust Design Technique

Robust Query Processing through Progressive Optimization

Robust query processing

Types of Workloads

Making the Pyramid Technique Robust to Query Types and Workloads

Managing Processes and Workloads

The Pyramid

The Pyramid

Workloads

The Grid Workloads Archive

Academic Workloads and TRAC

Making a Robust Trading System

Online Money Making Technique