How Many Words is a Picture Worth? Automatic Caption Generation for News Images

How Many Words is a Picture Worth?Automatic Caption Generation for News Images YansongFeng and MirellaLapata AshishBagate

What this paper is about • Explore the feasibility of automatic caption generation for images in news domain • Why particularly news domain – training data is available easily and abundantly

Why • Lots of digital images available on the Web • Improved searching • Analysis of the image • Keywords only searches are ambiguous • Targeted queries using longer search strings • Web accessibility

General Approach • Two step process • Analyze the image and build a representation for the same • Run the text generation engine on the image representation, and come up with a natural language representation

Related Work • Hede et al. – not practical because of controlled data set and also manual database creation • Yao et al. – based on just the image • Elzer et al. – what the graphic depicts, little emphasis on graphics generation • These methods use some background information /terminologies

Problem Formulation • For the given image I and the document D, generate a caption C • Training data contains document – image – caption tuples • Caption generation is a difficult task even for humans • A good caption must be succinct, informative, clearly identify the subject of the picture, draw reader to the article

Overview of the method • Similar to Headline generation task • Get the training data (it would be noisy) • Follows two stage approach • Get the keywords from the image (image annotation model) • Generate the caption from the given image words • Use of image features for faithful and meaningful description for the images

Image Annotation • Probabilistic model – well suited for noisy data • Calculate SIFT descriptors of images • Visual words by K means clustering • Get the keywords by LDA • dmix - bag of words representing image – document – caption

Extractive Caption Generation • Not much linguistic analysis is needed • Caption would be a sentence from the document which is maximally similar to description keywords

Types of Similarities • Word Overlap • Cosine Similarity • Probabilistic Similarity • KL divergence – similarity between an image and a sentence is measured by the extent to which they share the same topic distributions

Issues with Extractive Caption Generation • No single sentence can represent the image • Selected caption sentences might be longer than the average length of the sentence • May not be catchy

Abstractive Caption Generation • Word based model • Adapted from headline generation • Caption = the sequence of words that maximizes P

Abstractive Caption Generation • Phrase based model • Caption = the sequence of words that maximizes P

Evaluation…

Evaluation

Thanks!

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

Presentation Transcript

Playing the News

An Emerging Technique: Automatic Generation of PowerPoint Presentations

Automatic Transmission Fundamentals

Lecture 11 Major Combinational Automatic Test-Pattern Generation Algorithms

What are Words Worth? Vocabulary Instruction Worth Its Weight in Gold

Static and Dynamic Analysis

Automatic Generation of Taxonomies from the WWW

Conditional Random Fields for Automatic Speech Recognition

When a picture and its words don't match: Irony and Integration in Comics

Automatic Generation of Programs Using Model Checking and Genetic Programming

MKL for Category Recognition

Napoleon Bonaparte

Graphics and Java 2D ™

SIFT

Welcome! Your webinar will begin shortly.

Automatic Generation of Inputs of Death and High-Coverage Tests

Graphics and Java 2D ™

Automatic Voltage Regulator

Mapamatics of Images

Conditional Random Fields for Automatic Speech Recognition