1 / 46

A System for Large-scale, Content-based Web Image Retrieval - and the Semantics within

A System for Large-scale, Content-based Web Image Retrieval - and the Semantics within. Till Quack. Task. Create a content-based image retrieval system for the WWW Large-scale, one order of magnitude larger than existing systems. Means O(10 6 ) items Relevance Feedback

kaleb
Télécharger la présentation

A System for Large-scale, Content-based Web Image Retrieval - and the Semantics within

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A System for Large-scale, Content-based Web Image Retrieval - and the Semantics within Till Quack

  2. Task • Create a content-based image retrieval system for the WWW • Large-scale, one order of magnitude larger than existing systems. Means O(106) items • Relevance Feedback • Explore and exploit the semantics within • Take large-scale, content-based image retrieval one step closer to commercial applications

  3. Outline • Content-based Image Retrieval on the WWW • PART I: A System for Image Retrieval on the WWW • Features • Retrieval • Relevance Feedback • Software Design • PART II: The Semantics within • Identifying a Method to find Semantics • Data Mining for Semantic Clues • Frequent Itemset Mining and Association Rules • The Visual Link • Discussion & Demonstration • Conclusions & Outlook

  4. Content-based Image Retrieval on the WWW • Characteristics of the data repository • Size: 4.2 billion documents in Google’s index • Diversity: Documents in any context, language • Control: Anybody can publish anything • Dynamics: Ever changing • System Requirements • FAST • SCALABLE • Make use of all the information available • Motivation for a new system • Existing systems • Either pure text (Google) • Or pure content-based • Large-Scale

  5. PART I: A System for Large-scale, Content-based Image Retrieval on the WWW Ullrich MoenichTill QuackLars Thiele

  6. System Overview

  7. Visual Features describe the Images • Global Features from MPEG-7 Standard • Currently no Segmentation • Reasons: Scalability and the diversity of the data • Texture Features • Edge Histogram Descriptor (EHD) • Histogram of quantified edge directions. 80 dimensions • Homogeneous Texture Descriptor (HTD) • Output of Gabor filter-bank. 62 dimensions. • Color Features • Scalable Color Descriptor (SCD) • Color Histogram. 256, 128, 64 or 32 dimensions • Dominant Color Descriptor (DCD) • Up to 8 dominant colors (3d color-space) and their percentages • 32 “dimensions” • “Bins” defined for each image

  8. Collateral Text as an additional Feature • ALT Tag and Collateral Text around images • VERY uncontrolled annotation • Stemming: Porter Stemmer • Example: training -> train • More matching terms for boolean queries • But also some new ambiguities • train: to train [verb] / the train [noun]

  9. Retrieval in 2 Steps 1. Text Retrieval 2. Visual Nearest Neighbor Search

  10. Retrieval: Text • Options • Boolean query on inverted index • Vector Space Model • LSI etc. • Choice • Ranked boolean queries on inverted index • Ranking: tf*idf • Reasons • Speed • Sparsity of data: • 600 000 Keywords in total • 1 document: 10-50 words

  11. Retrieval – Visual Features (MPEG-7) • K-Nearest Neighbor search (K-NN) • Find K closest candidates ci to query image q in a vector space • Distance: Minkowsky Metrics for distance d(ci,q) namely L1 and L2 norms • Most MPEG-7 descriptors are high-dimensional vectors • The “dimensionality curse” applies • High dimensional spaces behave “weirdly” • In particular the distances are not too meaningful

  12. Retrieval – Challenges for Visual Features • We have several (visual) feature types How can we combine them? • Our database is very large.How can we search it fast enough? • i.e. how can we avoid comparing the query vector with each database entry?

  13. A Combined Distance for the MPEG-7 Features • We use a combined distance of all the visual feature types • The individual distances occupy different ranges in different distributions • The distributions were transformed to a normal distribution in the range [0,1] • The distances are then combined linearly

  14. Clustering speeds up the search • Problem • Millions of items in DB • Linear search over the whole dataset too slow • Looking only for the K nearest neighbors anyway • (One) Solution • Partition the data into Clusters, identified by representative, the centroid • Only search the cluster whose centroid is closest to query q • K-Means clustering algorithm • Not the best, in particular in HD spaces • But fast! • Problem with Clustering: • Query at the border of a cell does not find all the nearest neighbors • Simple Solution: Overlapping Clusters • Problem: Scalability • Original data 7GB • Overlapping data: 50 GB

  15. Relevance Feedback Improves the Results • Relevance feedback: User input to improve search results - iteration by iteration • i.e. the user selects „good matches“ • We obtain the following information: • A new query vector which is a combination of the relevant images = Query Vector Movement • The ratios for the combination of the feature types

  16. Relevance Feedback: Query Vector Movement • Construct the query vector qn of images selected in iteration n Vector component kFeature type f (EHD,SCD,HTD)i=1...M relevant images The final, new query vector is q = 0.75 *qn + 0.25 *qn-1 i.e. move from the old query vector towards the new vector

  17. Relevance Feedback: Weight Adapation • Which feature is most important for the given query? • The one for which all the relevant images are closest • Determine the ratios for the combination based on the average distance, e.g. for the EHD • and set

  18. Implementation – Software and Hardware • Languages: C++ and Perl • Inline::CPP to connect Layers • WWW: Apache and CGI • Relational DB: mySQL • Operating System: OS X • Hardware • Dual 2 GHZ Apple G5, 2GB RAM • Teran Terrabyte Disk Array

  19. Part II: The Semantics Within

  20. Semantics: Combining Text and Visual Features • Our dataset is multi-modal • Keywords and several visual features • Not only valid for WWW data • Video: image+speech, • Bio-imagery: image+microscope setting, cell coloring fluid • Goal: Try to jointly use the different modes • Do semantic relations between the modes exist? • Learn something about these semantic relations • Improve the retrieval precision based on them • Challenges in our project: • Large-scale • Noisy and uncontrolled data • Only global visual features

  21. Identifying a Method to find the Semantics • Related work • Latent Semantic Indexing (LSI) [Westerveld 2000] • – problem O(N2m3), N=Documents+Terms, m=concept space • Statistical models [Barnard, Forsyth 2001-2004] • Problem O: “several hours for several thousand images” • Problem: It is a (rather strict, hierarchical) model • Others • Neural networks (SOM etc.) • Hidden Markov Models • Often: Classification • We don’t know our classes, or: there are just too many • We can’t train them either (data too diverse and noisy) • Most of the methods above only tested on relatively small, supervised datasets • There is one more option …

  22. Method: Data Mining for Semantic Clues • Mine the data for patterns • Find them only where they exist • Deduce Rules from them • Scalable methods available • Frequent Itemset Mining and Association Rules • Classic Application: Market baskets, Census data … • Some works on Multimedia data • [Zaïane 98]: Datacubes with appended keywords • [Tešić et al. 03]: Perceptual associations (texture) within images

  23. Frequent Itemsets and Association Rules • Itemset I • Transaction T • Database D • Support of Itemset A • A is called frequent if • Rule • Support of a Rule • Statistical significance • Confidence of a Rule • Strength of implication • Maximum likelihood estimate that Bis true given that A is true

  24. Example & Advantages • Example: Market Baskets • Rule {Diaper,Milk}{Beer} • Advantages • Human readable • Can be edited • Fast Algorithms available • Note: Associations are not correlations • The same concept, just simpler • Associations and correlations: [Brin, Motwani, Silverstein 98]

  25. Using FIMI to find the itemsets • Frequent Itemset Mining (FIMI) • Find frequent itemsets with support > minsupp • Minimal support minsupp given by “an expert” • First Algorithm: APriori [Agrawal et al. 93] • Basic Idea: If an itemset is frequent, all its subsets must be frequent (Monotonicity) • k-passes over dataset for itemsets of length k • ~O(knp) n transactions, p items, itemsets of length k • Today’s algorithms • Rely on the same basic principle • But much faster (Main Reason: Data structures) • Usually only 2 database passes • ~linear runtime • State-of-the-art algorithm overview: FIMI’03 • We used: fpmax* [Grahne, Zhu: Nov 03]

  26. Diapers and Beer !!? • Application to the domain of Multimedia data: • Formulate images as transactions • Low-level clusters serve as a dimensionality reduction for the visual features • We find associations of visual features (clusters) and keywords • From theses associations we deduce semantic rules • Advantages • Comparably low computational complexity • Other data sources can be integrated in the same manner (e.g. long-term relevance feedback) • Challenges • Noisy, uncontrolled data • Associations within keywords much stronger than associations between keywords and visual features • Uneven distribution of cluster sizes (K-Means problem)

  27. Characteristics of the Itemsets and Rules • There are associations • Within text {shoe}  {walk} • Within visual clusters {EHD 14}  {SCD 12} • Between text and visual clusters {shoe}  {EHD 14} • Measure for interestingness or choice of rules from FI • Confidence? • Statistical Criteria? • Background Knowledge? (Example: pregnant -> Woman: 100% confidence) • Our „Background Knowledge“: Rules that connect keywords and low-level features are more interesting • Since this is known, the mining can be adapted and made even faster

  28. Exploiting the Itemsets and Rules

  29. Selecting Interesting Low-Level Clusters based on Rules • Clusters were introduced to partition the visual feature vector data and search only on certain clusters • Problem: We miss certain nearest neighbors if images for a concept are spread over several clusters • Unsatisfactory Solution: Overlapping Clusters • But association rules might find and solve this situation • Clusters are re-united • If number of images for concept in both clusters is >minsupp • Example: {shirt} -> {ehd249,ehd310} reunites these clusters for the initial keyword-query “shirt”! • This is scalable - unlike overlapping clusters • Another benefit is that more images labeled with the original keyword are “injected” into the results of K-NN search • Currently: One Keyword as high level semantic concept • Future: Find high level semantic concepts by mining associations within text first

  30. The Visual Link • Another contribution, NOT related to Frequent Itemset Mining and Association Rules… • Since search-concept suggests visual nearest neighbor search with relevance feedback after intitial keyword search: • It would be nice to have a diverse selection of images for a given keyword on the first page of results • Images sorted not only by keyword ranking, but also based on visual feature information • Basic idea: For a given keyword query, build groups of images that are visually close. • Larger groups are more important • Show only one representative per group

  31. The Visual Link: A Graph-Based Approach • Let I(Q) be a set of images matching a keyword query Q • Define a graph G(V,E) • i.e. images are visually linked if the distance between them is lower than a given threshold • Do a connected component analysis to find connected components C • For each component C find the „best“ representative rC • Re-rank results based on representatives rC

  32. The Visual Link: An Example

  33. The Visual Link: An Approximation • Problem: Distance calculations for graph take too long • Clusters cannot be used • Loading individual vectors takes a lot of time • Solution: • Approximate distance • Idea: If images in the same cluster and same distance range to the centroid  Probability that they are „close“ is high • New definition for visually linked • If in same cluster and same range of relative distance to its centroid • Can be encoded in relational DB! And comes at nearly no extra cost in creation

  34. Discussion & Demo

  35. Discussion: Precision • Measuring the quality of such a large-scale system is difficult • Precision/Recall measure not possible: ground truth not known • C: correct results • D: Desired results • A: Actual results • We measure the precision based on user questioning

  36. Before we continue … some numbers • Number of Images: 3 006 660 • Size of Image data: 111 GB • Feature Extraction: 15 days (dual 2Ghz CPU, 2GB RAM) • Number of distinct keywords: 680 256 • Size of inverted keyword index table: 50 260 345 lines • MySQL database size: 23 GB

  37. And now … the moment you’ve all been waiting for … • The Demo of Cortina

  38. Conclusions • A system with over 3 Million items was implemented • Probably the largest CBIR System to date? • A retrieval concept was introduced • a keyword query followed by relevance feedback and visual nearest neighbour search • Superior to existing retrieval concepts (query by keyword or query by example) • Data mining to explore and exploit semantics in large-scale systems was introduced

  39. Questions

  40. Outlook • Many extensions and improvements possible • Segmentation • Or maybe rather some simple tiling • Indexing • K-Means should be replaced • Suggestion: VA-File based approach [Manjunath,Tesic 03] • Association Rule Mining • Multilevel Approach • First keywords for high level semantic concepts • Then visual features

  41. Thanks • Ullrich Moenich and Lars Thiele

  42. Which Rules are of Interest? • There are associations • Within text {shoe}  {walk} • Within visual clusters {EHD 14}  {SCD 12} • Between text and visual clusters {shoe}  {EHD 14, SCD 12} • There are long and short rules • Short rules have higher support by the nature of the problem • Long rules contain more (precise) information about the semantics • Measure for interestingness or choice of rules from FI • Confidence? • Statistical Criteria? • Background Knowledge? (Example pregnant Woman )

  43. Characteristics and Challenges • Chosen criteria • Mainly interested in rules {keywords}  {visual feature clusters}. (Our “Background Knowledge”) • Support, confidence • Mine long and short rules • Restriction of the problem: Mine for frequent itemsets per keyword • i.e. all images=transactions for a given keyword • This means • We avoid being distracted by associations within keywords • The method is made even more scalable • The keyword as a placeholder for a semantic concept • A keyword does not always stand for a single semantic concept • Proposal for future versions: Multi-Level approach: • First {keywords}  {keywords} rules to identify “real” semantic concepts • Then itemset mining per identified concept

  44. Characteristics of the Itemsets and Rules - Overall

  45. Why keyword filtering of the results does not work

  46. Proposal: Semantic Clusters • Ultimate goal: Search some kind of „Semantic Clusters“ instead of visual feature clusters • Proposal based on approach from Ester et al. 2002, 2003 • Clustering based on frequent itemsets, originally for text • Clustering criterion: minimize overlap

More Related