720 likes | 851 Vues
This presentation discusses advanced methods for object recognition, focusing on Bag-of-Features models and their applications. It explores the concept of visual vocabulary, where images are represented as collections of features, akin to bags of words in text processing. Key algorithms like K-means clustering and classifiers such as Support Vector Machines (SVM) are highlighted for categorizing images. The use of these models facilitates the representation and classification of images from various categories, enhancing capabilities in computer vision and machine learning.
E N D
790-133 Recognizing People, Objects, & Actions Tamara Berg Object Recognition – BoF models
Topic Presentations • Hopefully you have met your topic presentations group members? • Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read. • Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.
Object Bag-of-features models Bag of ‘features’ source: Svetlana Lazebnik
Exchangeability • De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) source: Svetlana Lazebnik
Bag of words for text • Represent documents as a “bags of words”
Example • Doc1 = “the quick brown fox jumped” • Doc2 = “brown quick jumped fox the” Would a bag of words model represent these two documents differently?
Bag of words for images • Representimagesas a “bag offeatures”
Bag of features: outline • Extract features source: Svetlana Lazebnik
Bag of features: outline • Extract features • Learn “visual vocabulary” source: Svetlana Lazebnik
Bag of features: outline • Extract features • Learn “visual vocabulary” • Represent images by frequencies of “visual words” source: Svetlana Lazebnik
… 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic
… 2. Learning the visual vocabulary Visual vocabulary Clustering Slide credit: Josef Sivic
K-means clustering (reminder) • Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk • Algorithm: • Randomly initialize K cluster centers • Iterate until convergence: • Assign each data point to the nearest center • Recompute each cluster center as the mean of all points assigned to it source: Svetlana Lazebnik
Example visual vocabulary Fei-Fei et al. 2005
Image Representation • For a queryimage Extractfeatures Associateeachfeaturewiththe nearest cluster center (visualword) Accumulatevisualwordfrequencies over the image Visual vocabulary x x x x x x x x x x
….. 3. Image representation frequency codewords source: Svetlana Lazebnik
….. 4. Image classification CAR frequency codewords Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them? source: Svetlana Lazebnik
Image Categorization What is this? helicopter Choose from many categories
Image Categorization SVM/NB Csurka et al (Caltech 4/7) Nearest Neighbor Berg et al (Caltech 101) Kernel + SVM Grauman et al (Caltech 101) Multiple Kernel Learning + SVMs Varma et al (Caltech 101) … What is this? Choose from many categories
Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, JuttaWillamowski, Cédric Bray
Data • Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books • Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background
Method Steps: • Detect and describe image patches. • Assign patch descriptors to a set of predetermined clusters (a visual vocabulary). • Construct a bag of keypoints, which counts the number of patches assigned to each cluster. • Apply a classifier (SVM or Naïve Bayes), treating the bag of keypoints as the feature vector • Determine which category or categories to assign to the image.
Bag-of-Keypoints Approach Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu
SIFT Descriptors Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu
Bag of Keypoints (1) • Construction of a vocabulary • Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) • Define a “vocabulary” as a set of “centroids”, where every centroid represents a “word”. Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu
Bag of Keypoints (2) • Histogram • Counts the number of occurrences of different visual words in each image Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu
Multi-class Classifier • In this paper, classification is based on conventional machine learning approaches • Support Vector Machine (SVM) • Naïve Bayes Interesting Point Detection Key Patch Extraction Feature Descriptors Bag of Keypoints Multi-class Classifier Slide credit: Yun-hsueh Liu
x+ s.t. x+ x- Support Vectors Reminder: Linear SVM x2 Margin wT x + b = 1 wT x + b = 0 wT x + b = -1 x1 Slide credit: JinweiGu
Nonlinear SVMs: The Kernel Trick • With this mapping, our discriminant function becomes: • No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test. • A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space: Slide credit: JinweiGu
Nonlinear SVMs: The Kernel Trick • Examples of commonly-used kernel functions: • Linear kernel: • Polynomial kernel: • Gaussian (Radial-Basis Function (RBF) ) kernel: • Sigmoid: Slide credit: JinweiGu
SVM for image classification • Train k binary 1-vs-all SVMs (one per class) • For a test instance, evaluate with each classifier • Assign the instance to the class with the largest SVM output
Naïve Bayes Model C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class
Example: Slide from Dan Klein
Percentage of documents in training set labeled as spam/ham Slide from Dan Klein
In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein
In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein
Classification The class that maximizes:
Classification • In practice
Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow
Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.
Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.
Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:
Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class
Naive Bayes Parameters Problem: Categorize images as one of k object classes using Naïve Bayes classifier: • Classes: object categories (face, car, bicycle, etc) • Features – Images represented as a histogram of visual words. are visual words. treated as uniform. learned from training data – images labeled with category. Probability of a visual word given an image category.