970 likes | 1.14k Vues
Techniques for Indexing and Browsing Image/Video Databases. Kien A. Hua School of EECS University of Central Florida. Applications of Image/Video Retrieval Systems. digital libraries, distance learning, electronic commerce, movies on demand, public information systems, etc.
E N D
Techniques for Indexing and Browsing Image/Video Databases Kien A. Hua School of EECS University of Central Florida
Applications of Image/Video Retrieval Systems • digital libraries, • distance learning, • electronic commerce, • movies on demand, • public information systems, etc.
Image Retrieval - QBE Database Images Query Image Image Database Feature Extraction Feature Extraction Select Compare Metadatabase Feature Vectors Query Result
Noise-Free Queries (NFQ’s) • NFQ is more precise. • User can specify semantic constraints: • Spatial constraints (relative distances) • Scaling constraints (relative sizes) Rectangular query Noise-free query Similar Less relevant Kien A. Hua
Challenges • How do we extract features if we do not know the matching areas beforehand ? • How do we index the images ? Noise-free query
One Solution – Local Color Histogram (LCH) • Each subimage has a color histogram. • Any combination of the histograms can be selected for comparison.
Limitations of LCH • Dilemma: • Using large partitions is not precise (due to noise) • Using small partitions is too expensive • Limitation: • difficult to handle scaling
Sampling-Based Approach Idea: • Sampling 113 16x16 blocks, each represented by the quantized mean color • Comparing only the relevant blocks • Low storage overhead • Support NFQ’s • Robust to translation and scaling • Support spatial and scaling constraints Advantages Kien A. Hua
Handling Scaling Sampling query at 3 different rates Database Image Query 2 Query 3 Query 1 • A fixed sampling rate for all database images (a) • A higher rate to find larger matching objects (b) • The same rate to find matching objects of the same size (c) • A lower rate to find smaller matching objects (d)
Handling Scaling and Translation • We slide square windows of sizes 25, 41, 61, 85, and 113 sampled blocks over each database image. • 85 indexing subimages are captured at various sliding positions.
Signature Computation • For each indexing subimage, we compute its signature as seven average-variance pairs. • One from all the enclosed sampled blocks. • Four from sampled blocks in the four quarters, • Two from sampled blocks along the two diagonals • The first component of the signature is called the short signature.
Indexing • For each image, we map its 85 subimages into signature points in a 14-dimensional signature space. • We cluster these 85 signature points into five MBRs. • We insert these MBRs into an R* tree.
2-Phase Search Strategy 85 indexing subimages 85 feature vectors 85 signature points A database image 5 MBR’s (clusters) Used in Phase 1: Quick & Dirty Filtering 113 color averages Used in Phase 2: Block-to-Block Comparisons
Query Preparation • Sample the query image at different rates. • For each sampling rate: • Determine the core area that contains the maximum number of relevant sampled blocks and least noise. • Compute the signature of the core area.
Performance Comparison • LCH • NFQ-capable • Correlogram • one of the best whole matching techniques
Experimental Studies • Database: 15,808 images of various categories • Workload: 100 queries • Type 1: Query and database images have the same size; and the NFQ covers less than half of the query image (30 queries) • Type 2: Query and database images have the same size; and the NFQ covers more than one half of the query image (20 queries) • Type 3: query and database images have different sizes (50 queries)
Type-3 Queries • Only SamMatch can handle Type-3 queries. • In the following example, there is no easy way to match the two identical apples using LCH.
Performance Results (Type 1) SM Corr. LCH
Performance Results (Type 3) The sizes of queries are different from those of database images Query 4 2 3 5 12 18 Query 3 216 396 2
Performance Metric • Ai denotes a relevant image returned by the system • S is the scope of the query (i.e., maximum number of images returned) • q is the total number of relevant images in the database. Rationale: Low-ranked Images do not make it to the user.
Effectiveness • R/S average • Type 1Type 2 • Higher R/S is better.
Time & Space • SamMatch requires much less storage overhead • LCH uses 21 color histograms: 21 256 2 bytes • Correlogram uses 4 color histograms: 4 256 2 bytes • SamMatch uses one byte per sampled block: 113 bytes • In terms of exhaustive search, SamMatch is • 100% faster than Correlogram, and • 200% faster than LCH
Concept Representative Images Car Flower . . . Annotation for Keyword Search Dictionary Keywords Flower Index Image Database
Concluding Remarks • Reducing noise interference is essential to achieving more reliable image retrieval • SamMatch supports NFQs effectively and efficiently • Two times faster than LCH, and one time faster than Correlogram under exhaustive search • Other benefits of SamMatch include: • Matching objects at different scales • Uncovering translations of the matching areas • Handling spatial and scaling constraints • SamMatch uses less than 1/16 the storage space required by LCH and Correlogram
Relevance Feedback (RF) • Problem: Semantic gap between low-level visual features and the high-level concepts conveyed by the query images • A Solution: User relevant feedback • User identifies relevant images within the returned set • System utilizes feedback to modify the query to retrieve better results in the next round • This process repeats until user is satisfied with the results
Relevance Feedback Techniques • Query Point Movement • Multiple Queries • Qcluster • Multiple Viewpoints
Centroid of the relevant images is the new query point in the next iteration Distance function is weighted such that the query contour is shaped to optimize the query result Query Point Movement
Relevant points according to user feedback are grouped into clusters, each represented by its centroid. Distance of a data point is computed as weighed combination of individual distances from the centroids. Weight factor is proportional to the cluster size Multipoint Query
Multipoint Query likely includes irrelevant images located between the clusters of desired images. Solution: use disjunctive queries such that only images near the cluster centroids are considered relevant Qcluster
Searching for relevant images using variants of query image (i.e., color, negative, grayscale). Recognizes red cars and white cars as cars, but not all cars from various viewpoints, nor cars with different appearance. Multiple Viewpoints
Adjust distance function to shape the “query footprint” to contain maximum number of relevant images and least number of irrelevant ones Query Point Movement: Distance function is weighed. Multipoint Query: Distance is computed as weighed combination of individual distance from cluster centroids. Qcluster: Distance is computed as the distance to the nearest cluster centroid. Retrieving images using variants of the query image (Multiple Viewpoints) RF Techniques - Summary
Common limitation • Assumption: Similar images are located “close” to each other in the feature space, according to some distance measure. • Reality: similar images can look very different, and are therefore far apart in the feature space or sub-spaces.
Query Decomposition (QD) • Decompose an initial query into localized subqueries based on user relevance feedback • Subqueries are processed independently • Their local results are merged into a single ranked list to form the final result
RF Support (RFS) Structure • Hierarchically clustered database images into a tree structure such as R*-tree to form a RFS structure • Representative images are selected for each cluster in the RFS structure in a bottom-up fashion …… …… …… ……
Query DecompositionA new RF technique • For each relevant cluster, representative images are randomly selected to support localized RF • Relevant images from localized RF are used to identify relevant clusters in the next level of the RFS structure • k-NN computation is performed independently for each relevant clusters at the leave level • Local results are merged
QD Illustration Node 1 Level 1 Level 2 Level 3 …… Node 4 Node 3 Node 2 Node 5 Node 6 Node 7 Node 8
Localized k-NN Computation • Alocalized k-NN computation is done independently over the corresponding relevant cluster at a leave node • If a query point is sufficiently distant from the center of its cluster, compute the k-NN query over the parent node in the RFS structure
Hiding Complexity • User is not aware of the internal query decomposition process • User is presented a single list of ranked image for each round of RF • Subqueries and their results are maintained internally without the knowledge of the user • Not a hierarchical browsing technique
PROTOTYPE • 37 Features: • 9 color moments, • 10 Wavelet-based Texture, and • 17 edge-based structural features. • RFS Structure: • Three levels deep • Each node contains 70 to 100 representative images • User interface: • results presented in groups
Presentation of Final Results • Intra-cluster Ranking: Relevant images are ranked according to their own localized k-NN computation • Inter-cluster Ranking: The matching score for each result group, due to a localized k-NN computation, is the sum of the scores of its top k matches • Result images are presented in groups • The result groups are presented in their ranking order • Relevant images are presented in their ranking order within each group
Control Panel Example image panel Result Panel The localized sub-queries are “dinosaur,” “eagle,” “Owe,” “Cat,” and “Horse.” Query results are presented in the left panel. Screenshot of UCF QD system
Example - Level 1 Control panel Example image panel Result panel
Example - Level 2 Control panel Example image panel Result panel
Example - Level 3 Control panel Example image panel Result panel
Final Result Four groups of results corresponding to the examples Control panel Example image panel Result panel
Experiment study • Dataset has 20,000 images from COREL and Internet • Compared to Multiple Viewpoints • Performance Metrics