Tamara Berg Object Recognition – BoF models

790-133 Recognizing People, Objects, & Actions Tamara Berg Object Recognition – BoF models

Topic Presentations • Hopefully you have met your topic presentations group members? • Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read. • Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.

Object Bag-of-features models Bag of ‘features’ source: Svetlana Lazebnik

Exchangeability • De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) source: Svetlana Lazebnik

Bag of words for text • Represent documents as a “bags of words”

Example • Doc1 = “the quick brown fox jumped” • Doc2 = “brown quick jumped fox the” Would a bag of words model represent these two documents differently?

Bag of words for images • Representimagesas a “bag offeatures”

Bag of features: outline • Extract features source: Svetlana Lazebnik

Bag of features: outline • Extract features • Learn “visual vocabulary” source: Svetlana Lazebnik

Bag of features: outline • Extract features • Learn “visual vocabulary” • Represent images by frequencies of “visual words” source: Svetlana Lazebnik

… 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic

… 2. Learning the visual vocabulary Visual vocabulary Clustering Slide credit: Josef Sivic

K-means clustering (reminder) • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it source: Svetlana Lazebnik

Example visual vocabulary Fei-Fei et al. 2005

Image Representation • For a queryimage Extractfeatures Associateeachfeaturewiththe nearest cluster center (visualword) Accumulatevisualwordfrequencies over the image Visual vocabulary x x x x x x x x x x

….. 3. Image representation frequency codewords source: Svetlana Lazebnik

….. 4. Image classification CAR frequency codewords Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them? source: Svetlana Lazebnik

Image Categorization What is this? helicopter Choose from many categories

Image Categorization SVM/NB Csurka et al (Caltech 4/7) Nearest Neighbor Berg et al (Caltech 101) Kernel + SVM Grauman et al (Caltech 101) Multiple Kernel Learning + SVMs Varma et al (Caltech 101) … What is this? Choose from many categories

Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, JuttaWillamowski, Cédric Bray

Data • Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books • Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background

Method Steps: • Detect and describe image patches. • Assign patch descriptors to a set of predetermined clusters (a visual vocabulary). • Construct a bag of keypoints, which counts the number of patches assigned to each cluster. • Apply a classifier (SVM or Naïve Bayes), treating the bag of keypoints as the feature vector • Determine which category or categories to assign to the image.

Bag-of-Keypoints Approach Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

SIFT Descriptors Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

Bag of Keypoints (1) • Construction of a vocabulary • Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) • Define a “vocabulary” as a set of “centroids”, where every centroid represents a “word”. Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

Bag of Keypoints (2) • Histogram • Counts the number of occurrences of different visual words in each image Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

Multi-class Classifier • In this paper, classification is based on conventional machine learning approaches • Support Vector Machine (SVM) • Naïve Bayes Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu

SVM

x+ s.t. x+ x- Support Vectors Reminder: Linear SVM x2 Margin wT x + b = 1 wT x + b = 0 wT x + b = -1 x1 Slide credit: JinweiGu

Nonlinear SVMs: The Kernel Trick • With this mapping, our discriminant function becomes: • No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test. • A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space: Slide credit: JinweiGu

Nonlinear SVMs: The Kernel Trick • Examples of commonly-used kernel functions: • Linear kernel: • Polynomial kernel: • Gaussian (Radial-Basis Function (RBF) ) kernel: • Sigmoid: Slide credit: JinweiGu

SVM for image classification • Train k binary 1-vs-all SVMs (one per class) • For a test instance, evaluate with each classifier • Assign the instance to the class with the largest SVM output

Naïve Bayes

Naïve Bayes Model C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

Example: Slide from Dan Klein

Slide from Dan Klein

Percentage of documents in training set labeled as spam/ham Slide from Dan Klein

In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

Classification The class that maximizes:

Classification • In practice

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.

Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:

Naïve Bayes on images

Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

Naive Bayes Parameters Problem: Categorize images as one of k object classes using Naïve Bayes classifier: • Classes: object categories (face, car, bicycle, etc) • Features – Images represented as a histogram of visual words. are visual words. treated as uniform. learned from training data – images labeled with category. Probability of a visual word given an image category.

Tamara Berg Object Recognition – BoF models