240 likes | 356 Vues
Techniques and Data Structures for Efficient Multimedia Similarity Search. Reading Assignment. Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999. Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search. Introduction.
E N D
Techniques and Data Structures for Efficient Multimedia Similarity Search
Reading Assignment • Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999. • Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search
Introduction • Feature space is multidimensional. • Objective: divide the multidimensional space into subspaces, so that only a few subspaces need to be searched for each query.
Query Types • Point queries • Range queries • k nearest-neighbor queries
Operations on Data Structures • Search • Most important as it is done “on-line” • Must be efficient • Insertion and Deletion • Less important as they can be performed “off-line” • Need not be efficient
Efficient Search Techniques • Filtering • B and B+-trees • Clustering • MB+-trees • k-d trees • Grid files • R trees • TV trees
Filtering • Reduce the search space by selecting specific items satisfying certain criteria. • Search on the multidimensional feature vector is carried out on the selected set of items only.
Filtering with Structured Attributes/Classification • Structured attributes associated with multimedia data. • e.g. date • Subject classification • e.g. image-content type=“Natural Scene”
Filtering using the triangle inequality • d is a distance metric and i,q, and k are feature vectors for three objects. • Most feature distance measures are “metrics” • d(i,k)=0 iff i=k • d(i,k)= d(k,i) • Triangle inequality above
Application of Triangle Inequality to Multimedia Retrieval • MDB Representation • Select m feature vectors as a comparison base F, where m << |MDB|, the size of the database • iMDB fjF, calculate d(i,fj) and store in MDB. • MDB retrieval for query q • Calculate d(q,fj) fjF. • Compute l(i) = max 1jm |d(i,fj)- d(q,fj)| • Find d(i,q) iMDB l(i) T, where T is a specified threshold.
Example • Find database items whose distance to query q < 3 where the distances between q and the 2 vectors f1 and f2 forming the base are 3 and 4, respectively
Filtering in the Color Histogram-Based Retrieval • Space reduction can take place by • Reducing the number of bins • Selecting only a subset of the database images for calculating the distances from the query image. • Space reduction is achieved through the following process: • Select potential image candidates. • Use full histogram comparison to calculate the distance between the query and the candidates.
Selecting Potential Image Candidates • Use very few bins in comparing color histograms. • Use the average color of images.
B-Trees (1) • B-Trees are always balanced. • B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference. • B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.
B-Tree Definition A B-Tree of order m has these properties: • The root is either a leaf or has at least two children. • Each node, except for the root and the leaves, has between m/2 and m children. • All leaves are at the same level in the tree, so the tree is always height balanced. A B-Tree node is usually selected to match the size of a disk block. • A B-Tree node could have hundreds of children.
50 B-Tree Search Search in a B-Tree is a generalization of search in a 2-3 Tree. • Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search. • Otherwise, follow the proper branch and repeat the process. 20 30 70 10 12 13 21 23 35 37 40 55 59 60 73 80
B+-Trees The most commonly implemented form of the B-Tree is the B+-Tree. Internal nodes of the B+-Tree neither store records nor pointers to records. They only store key values to guide the search. Leaf nodes store records or pointers to records. A leaf node may store more or less records than an internal node stores keys. Q: What is the advantage of using a B+-tree over a B-tree implementation?
B+-Tree Example Search for 21?
B+-Tree Insertion Insert the following keys into an initially empty B+ tree of order 4 where each leaf node can hold a maximum of 5 records: 12 , 23 , 10 , 48 , 33 , 50 , 15 , 18 , 20 , 45 , 47 , 31 , 52 , 21 , 30 What is the cost of direct access to a B+ tree of order n with N database items?
Clustering • Similar information items are grouped together to form a cluster based on a certain similarity measurement. • Each cluster is represented by the centroid of the feature vectors of that cluster. • How is search carried out?
2D Example x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x