250 likes | 415 Vues
M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation www.languagecomputer.com Richardson, Texas. Exploiting Ontologies for Automatic Image Annotation. Contents. Motivation Automatic Image Annotation Problem Ontologies for Defining Visual Vocabularies
E N D
M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation www.languagecomputer.com Richardson, Texas Exploiting Ontologies for Automatic Image Annotation
Contents • Motivation • Automatic Image Annotation Problem • Ontologies for • Defining Visual Vocabularies • Hierarchical Models for image annotation • Related Work • Experiments & Results • Conclusion and Future Work
Motivation: Multimedia Question Answering • Majority of efforts in Q/A focus on textual corpora and processing • Large amounts of information held within multimedia sources – images/audio/video • Extend the Power of Q/A into the realm of multimedia • Exploit commonality and union of text and multimedia information
Caption: Ronaldo seals Brazil's place in the last eight with a shot through Geert de Vlieger's legs late on to eliminate Belgium Question: What color jersey did Brazil wear in the World Cup? Multimedia Question Answering • Some ways in which multimedia can be used in Q/A • Multimedia (video clip/image) as Answer • Multimedia and Lexical combination providing enhanced understanding to Answer questions
Approach • Feature extraction • High- and Low-level features • Object recognition • Auto Annotation of images • Object semantics extraction • Locative/temporal/etc • Build Knowledge Representation from Image/Video • Merge with audio/text Knowledge Representation • Lexical information from ASR and VOCR • Provide Multimedia Q/A based using Multimedia Ontologies • Feature extraction • High- and Low-level features • Object recognition • Auto Annotation of images • Object semantics extraction • Locative/temporal/etc • Build Knowledge Representation from Image/Video • Merge with audio/text Knowledge Representation • Lexical information from ASR and VOCR • Provide Multimedia Q/A based using Multimedia Ontologies
Automatic Image Annotation Task of automatically assigning words to an image that describe the contents of the image • Most models exploit the correlation between images and words • Exploit the correlation between the annotation words themselves to • Define visual vocabularies • Develop hierarchical models for automatic image annotation Use ontological information about annotation words to improve image annotation
Prior Work: Translation Models • Models for translating visual representation of concept to textual representation (Duygulu et al., 2002) • Based on Brown model for Machine Translation (Brown et al., 1993) • Image Features translate to Annotation Words • K-Means used to cluster image features to generate blobs • Dependencies between blobs and words is not explicitly captured Use ontology to drive the definition of blobs
Prior Work: HACM Model • Hierarchical Aspect Cluster Model (T. Hofmann, 1998) • Induces an hierarchical structure from co-occurrence of image features • Topology is externally defined • Depth of the induced hierarchy is user selected • Levels define the generality of the concept expressed in regions and words The hierarchies defined in ontologies have well-defined semantics Image feature hierarchy induced from a text ontology
Prior Work: Classification Approaches Estimate P(w|I) to classify an Image I (represented by image features) into one of the classes (annotation word w) • Generative Models • Flat classification: Learn one classifier per annotation word • SVM Classifier (Cusano et al., 2004) • Discriminative Models • Jeon and Manmatha (2004) showed improvements over translation using Maximum Entropy Models • Unigram (blob, word) and Bigram: (horizontal blob pairs, word) feature Explore hierarchical classification using ontology
Image Representation usingVisual Vocabulary • Image Segmentation • Image regions corresponding to objects in the image • Grid-based image segmentation • Feature Extraction • Extract image features from image regions • Color, Shape, Texture • Image Representation • real-valued feature vectors • Visual vocabulary derived based on clustering feature vectors • Cluster centers (Blobs) define the vocabulary Image Segmentation Feature Extraction Image Representation Image
Visual vocabulary from Ontologies • Image regions from images are organized in the hierarchy based on the image annotation • Image attributes of children nodes are related parent node’s image attributes
Using Ontologies in Translation Models for Automatic Image Annotation • Ontology-induced visual vocabulary • Annotation word hierarchy used in selecting the initial set of blobs for K-means clustering • Ontology-weighed K-means clustering • Weight the cluster membership of image regions in the estimation of cluster centers (blobs) n(w,c) – number of image regions in cluster c associated with word w n(c) – number of image regions in cluster c f(r) – feature vector for region r
Image Annotation by Hierarchical Classification • Based on hierarchical approach to text classification (McCallum et al., 1998) • Statistical, back-off model induced by the hierarchy derived from annotation word ontology • Given an image I with blob sequence , the probability of word w is given by • Assuming a Bernoulli model for annotations, the blob likelihood given a word is estimated as V – Visual vocabulary T – Training set of annotated images W – Set of annotation words
Image Annotation using Hierarchical Classification (contd.) • The IS-A hierarchy among annotation words is used to estimate blob-likelihood probability ROOT … animal feline cat • Feature weights learned using EM algorithm tiger cougar leopard lion lynx
Experiments • Corel Data Set • Annotated images using pre-processed data from (Duygulu, et al., 2002) • 4500 images annotated using 374 words • 4000 for training; 500 for testing • Image Representation • Image Segmentation using N-cuts (Duygulu et al., 2002) • 36 different image features represent each image region • Ontology: WordNet • Hierarchy with 714 unique concepts was induced from 374 annotation words
Image Annotation Evaluation Annotation systems predict P(w|I) • A cut-off or threshold required to assign annotations • Unnormalized: take top 5 words • Normalized: take top m words, where m is #of annotations for I Metrics • Number of words of positive recall • Mean per-word Precision-Recall • All words in the dictionary • Selected set of words • Retrieved: words retrieved using the method • Common: words predicted by all annotation systems • Union: all words predicted by at least one annotation system
Results: Translation Models and Ontologies • Precision/Recall numbers are average over “pooled” set of 42 words • Observations • Using ontologies increase the number of words predicted with postive recall • Hierarchy based initial clusters attaches better semantics to clusters • Results for ontology-induced clusters is based on ‘One blob per concept’
Results: Classification Approaches and Ontologies Comparing Flat classification versus Hierarchical classification for image annotations • Precision/Recall numbers correspond to using the KM-500 visual vocabulary • Observations • Improved Precision (10%) and Recall (14%) values • Increase in number of annotations with positive recall • Hierarchy derived from annotation ontology results in improved performance
Results: Hierarchical Classification with Ontology-induced Visual Vocabularies • Hierarchical approach improves precision/recall values on different visual vocabularies • ONT-714 has improved positive recall numbers • Ontologies defined on text annotations provide a good framework for developing hierarchical models for image features
Results: Comparing Translation and Classification Approaches • Comparison based on common annotation words predicted by different models • Significant improvement in recall using classification approaches
Ontologies in Automatic Image Annotation • Experimental Results: • Ontology in translation model • 19.5% increase in average precision • 13% increase in average recall • Ontology in classification • 10% increase in average precision • 14% increase in average recall Using word hierarchies improve annotation results when used • as a source for selecting initial blobs, and • as framework for hierarchical classification
Summary and Future Work • Proposed methods for using ontologies in automatic image annotation • Translation Models: Defining Visual vocabulary • Hierarchical Classification Models: Provide the hierarchy for models defined image features • Explore the use of ontologies in other approaches to automatic image annotation • Discriminative models • Exploit the dependence between annotation words in automatic image annotation • Correlation between annotation words of an image can be exploited
Summary and Future Work (Contd.) • Utilize hierarchical organization of concepts and language models on image blobs to develop multi-modal ontologies • Use multi-modal ontologies in Q/A
Multimedia Ontology: Example Node • Transportation WordNet hierarchy with Multimedia data