CVPR 09 Paper Review

CVPR 09 Paper Review 主讲人: 谢术富

Paper List • Feng Tang, Suk Hwan Lim, Nelson L. Chang, and Hai Tao, A Novel Feature Descriptor Invariant to Complex Brightness Changes, CVPR 09. • Ali Farhadi, Ian Endres, Derek Hoiem, David Forsyth, Describing Objects by their Attributes, CVPR09.

A Novel Feature Descriptor Invariant to Complex Brightness Changes • Authors: Feng Tang, Suk Hwan Lim, Nelson L. Chang, and Hai Tao • Feng Tang: researcher in Multimedia Interaction and Understanding Lab (MIUL), HP Labs http://users.soe.ucsc.edu/~tang/ • Hai Tao: University of California, Santa Cruz http://users.soe.ucsc.edu/~tao/.

Abstract • We describe a novel and robust feature descriptor called ordinal spatial intensity distribution (OSID) which is invariant to any monotonically increasing brightness changes. Many traditional features are invariant to intensity shift or affine brightness changes but cannot handle more complex nonlinear brightness changes, which often occur due to the nonlinear camera response, variations in capture device parameters, temporal changes in the illumination, and viewpoint-dependent illumination and shadowing. • A configuration of spatial patch sub-divisions is defined, and the descriptor is obtained by computing a 2-D histogram in the intensity ordering and spatial sub-division spaces. Extensive experiments show that the proposed descriptor significantly outperforms many state-of-the-art descriptors such as SIFT, GLOH, and PCA-SIFT under complex brightness changes. Moreover, the experiments demonstrate the proposed descriptor’s superior performance even in the presence of image blur, viewpoint changes, and JPEG compression. • The proposed descriptor has far reaching implications for many applications in computer vision including motion estimation, object tracking/recognition, image classification/retrieval, 3D reconstruction, and stereo.

摘要 • 我们提出了一种新的、鲁棒的特征描述子, 有序的空间灰度分布(ordinal spatial intensity distribution, OSID), 它对任意单调递增的灰度变化是不变的。 • 一些传统特征对灰度的变化或仿射亮度变化保持不变, 但不能处理更复杂的非线性的亮度变化. 由于非线性的照相机响应, 光照的时间变化以及依赖于视角的光照和阴影,这些变化经常出现. • 在图像块上我们定义了它的空间划分,通过计算灰度序和空间划分的2D直方图得到了该描述子.

摘要 • 大量的实验证明, 提出的算子在复杂光照变化下比SIFT, GLOH和PCA-SIFT性能更好. 进一步地, 实验验证了即使存在光照模糊,视角变化以及JPEG压缩时, 提出的算子具有优越的性能. • 提出的算子在计算机视觉中具有很多的应用, 如运动估计, 物体跟踪与识别,图像分类与检索,三维重建以及立体视.

动机 • 复杂的光照变化在现实环境中存在,而一些特征(如SIFT)只能对光照的仿射变换(I2=k1*I1+k2)保持不变性. • 由于光照或者捕获参数的变化,尽管像素亮度值会发生变化, 但如果一个局部块内的亮度变化是单调递增的,像素之间的序关系保持不变. • 本文假定局部块内的亮度变化函数对块内的所有像素保持单调递增: f(Ii)≥f(Ij) for all Ii≥Ij .

基本思路 • 对局部块内的像素按照大小序进行量化, 并对图像块进行空间划分,提取二维直方图.

预处理 • 高斯平滑定位关键点. • 特征点检测: 可选择任意关键点检测器, 如Intensity-extrema**. • 图像块大小: dxd (d=41). **K. Mikolajczyk and C. Schmid, Scale and Affine invariant interest point detectors. In IJCV 60(1):63-86, 2004.

序与空间划分 • 对块内的像素, 划分为nbins个bin(如果有400个亮度级别和5个bin, 每个bin有80个亮度级: 1~80, 81~160, 161~240, 241~320,321~400). • 若符合本文的假设, 则像素灰度值的序关系基本保持不变. • 对空间划分, dxd的块划分为npies个子块(圆环划分为饼状)(也可选择其他的空间分块方式). • 保持图像块的空间结构信息(单个直方图会丢掉这些信息).

2-D直方图计算 • 对每个图像块计算2-D直方图, x轴表示亮度量化级别, y轴表示空间分布. • 在将2-D直方图转化为1-D向量时, 可采用自适应的方法(如选择最暗/最亮的子块作为起始点). • 直方图的维数是nbins x npies. • 当采用不同的图像块大小, 直方图可用像素数目进行归一化.

实验 • Code&Dataset: (http://www.robots.ox.ac.uk/~vgg/research/affine/). • 图像集合包含不同的几何与光照变化(如视角变化, 图像模糊, 光照变化, JPEG压缩). • Detector: • Hessian affine region detector**. **K. Mikolajczyk and C. Schmid, Scale and Affine invariant interest point detectors. In IJCV 60(1):63-86, 2004.

实验 • Evaluation metric: • recall = #correct matches/#correspondences • 1-precison = #false matches/(#correct matches + #false matches) • correspondence: 关键点的对数(ground truth). • correct match: 关键点之间距离小于一定阈值且是ground truth.

参数选择 • 三个参数: • Gauss kernel: 5x5, sigma=1. • 最优参数: Ordinal bins=8, spatial bins = 16

与其他算子对比—光照变化 • Comparison of OSID against other descriptors under illumination changes using “Leuven” dataset. (a) matching performance between the 1st and the 4th image. (b) matching performance between the 1st and the 6th.

与其他算子对比—图像模糊 • Comparison of OSID against other descriptors under image blur using “Bike” dataset. (a) matching performance between the 1st and the 6nd image. (b) matching performance between the 1st and the 6th.

与其他算子对比—视角变化 • Comparison of OSID against other descriptors under view point changes using “Wall” dataset. (a) matching performance between the 1st and the 2nd image. (b) matching performance between the 1st and the 4th.

结论 • OSID对单调的光照变化保持不变. • 利用序和空间直方图保持了纹理以及结构信息. • 实验验证了该算子的有效性.

Describing Objects by their Attributes • Authors: Ali Farhadi, Ian Endres, Derek Hoiem, David Forsyth • Ali Farhadi • PhD student University of Illinois at Urbana-Champaign (UIUC) • http://www.cs.uiuc.edu/homes/afarhad2/ • David Forsyth • http://luthuli.cs.uiuc.edu/~daf/ • Author of “Computer Vision, A Modern Approach”. • Papers: 1987~Now.

Abstract • We propose to shift the goal of recognition from naming to describing. Doing so allows us not only to name familiar objects, but also: to report unusual aspects of a familiar object (“spotty dog”, not just “dog”); to say something about unfamiliar objects (“hairy and four-legged”, not just “unknown”); and to learn how to recognize new objects with few or no visual examples. • Rather than focusing on identity assignment, we make inferring attributes the core problem of recognition. These attributes can be semantic (“spotty”) or discriminative (“dogs have it but sheep do not”). Learning attributes presents a major new challenge: generalization across object categories, not just across instances within a category. • In this paper, we also introduce a novel feature selection method for learning attributes that generalize well across categories. We support our claims by thorough evaluation that provides insights into the limitations of the standard recognition paradigm of naming and demonstrates the new abilities provided by our attribute based framework.

摘要 • 我们提出将识别的目标从命名(naming)转移到描述.通过这样的方式, 我们不仅能命名熟悉的物体,而且可以:报告熟悉物体的非常见属性(如”斑点狗”而不仅仅是”狗”); 对不熟悉物体进行描述(如“有毛的且有四条腿”而不仅仅是”不知道”);利用少数甚至没有可视化的例子来学习如何识别新的物体. • 我们将推断属性作为识别的核心问题,而不是身份识别.这些属性可以具有语义属性(如”斑点”)或判别性(如”狗有的而羊没有的”).属性学习提出了一个新的挑战: 在物体类别间进行推广而不仅仅限制于同一类的物体. • 本文中, 我们也引入了一种新的特征选择方法, 能够在类别间具有很好的推广能力.通过详尽的评价, 我们证明了传统命名识别方法的局限以及我们基于属性框架的新能力.

本文要解决的问题 传统的方法:命名物体 Describe unknown object category 本文的方法 Report atypical attributes of known classes Learn models of new object categories from pure textual description

动机 • 推断属性可以描述,比较甚至更容易地分类不同物体. • 当面对一类新的物体时, 我们可以对它进行描述(如,“长毛的且有四条腿”)而不仅仅是无法进行识别(“不知道”). • 我们还可以描述某一类物体的特别之处(如“斑点狗”), 并仅根据简单的描述就可以来学习识别物体的类别. • 本文假定物体已经定位准确,着重解决”是什么”(what is this?)而非”在哪里”(where is it?)的问题.

基本框架 底层的图像特征如形状,材质,部件

特征提取 • 基特征(Base Features): (Bag of Words),最终特征维数: 9751维. • Texture descriptor: texton filter bank with 256 kmeans centers. • HOG spatial pyramid: 8x8 blocks, 4 pixel step size, 1000 kmeans centers. • Edge: Canny detector and orientation is quantized into 8 bins. • Color: LAB and quantized into 128 kmeans centers. • 语义属性(Semantic Attributes),三大类,对描述物体类别比较有用. • 形状(Shape): 2D boxy, 3D boxy or cylindrical. • 部件(Part): has head, leg, arm and wing. • 材质(Material): wood, furry, glass, shinny. • 判别属性(Discriminative Attributes),区分不同类别的物体. • 引入辅助判别属性**:通过选择物体的属性,把数据划分为两部分(把猫与狗划分为两部分，而不关心摩托车属于哪一边),这些数据用线性SVM来学习属性分类器.然后, 利用SVM在更多的划分上学习并选择特征. • 数量: 1000个. **Ali Farhadi, David A. Forsyth, and Ryan White. Transfer learning in sign language. In CVPR, 2007.

Learning to Recognize Semantic Attributes • 基于全部训练集学习属性分类器, 存在的问题: • 从汽车,摩托车,公交车和火车数据集合上学习“车轮”分类器, 该分类器可能学到”金属”(metallic)而非”车轮”,因为车轮被金属包围. • 分类器会学习到一个与目标(“车轮”)相关的特征(“金属”)而非我们期望的特征. • 当训练和测试没有很强的相关性时, 这种方法会产生问题.

特征选择 • 目的: 去除不同属性分类器预测值的相关性. • 实现: 学习”车轮”分类器, 选择将有轮和没轮的汽车分类得很好的特征; 再把所有选择的特征放在一起在全部训练集上进行学习. • 方法: L1-regularized logistic regression**. **Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In ICML, 2004.

Datasets • 对物体标注属性, 存在不确定性. • 标注完成: 作者与Amazon Turk annotators.不同人的一致性:作者之间:84.3% 标注者之间:81.4% 作者与标注者之间: 84.1% • a-PASCAL • PASCAL VOC 2008(20类): people, bird, cat, cow, dog, horse, sheep aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant(盆栽植物), sofa, and tv/monitor. • 每类物体数目150~1000, 人: 5000多个实例, 64个属性. • a-Yahoo • 12类: wolf, zebra, goat, donkey, monkey, statue of people(人的雕像), centaur(半人马座), bag, building, jet ski(水艇), carriage, and mug(杯子). • 与a-Pascal的物体有一定相关性, 但也存在不同: wolf vs. dog. • http://vision.cs.uiuc.edu/attributes/

Experiments • Describing Objects • Assigning attributes • Unusual attributes • Naming • Naming familiar objects • Learning to Identify New Objects • Learning New Categories from Textual Description • Across Category generalization • Semantics of Learned Attributes • Localization • Correlation

Assigning attributes • 衡量指标: the area under the ROC curve. Both train and test on a-Pascal set: 本文方法体现不出优越性.

Attribute prediction for across category protocols • Generalization ability is enhanced by feature selection: • “taillight”, “cloth”, and “rein” in (a) • “wing”, “door”, “headlight”, and “taillight” in (b) (a) (b)

Selected positively predicted attributes Wrong Predictions

Absence of typical attributes • 752 expected attributes over the whole dataset which are not visible in the images, and 68.2% of them are correct.

Reporting the presence of atypical attributes • There are 951 of those predictions which are not expected to be in the predicted class, and 47.3% are correct.

Naming familiar objects Mean per class accuracy • The attribute based representation does not help significantly in the traditional naming task but it offers new capabilities. • The bounding box is provided, and the results can not be compared with previous methods. Overall accuracies

Learning to Identify New Objects 我们的方法用40个训练样本相当于一般方法用200个训练样本 • Our method can recognize new classes with notably fewer training examples than classifiers trained on base features. • Learning New Categories from Textual Description: learn new categories by describing new classes to our algorithm as this new class is “furry”, “four legged”, “has snout”, and “has head”. The object description is specified by a list of positive attributes, providing a binary attribute vector. 32.5%

Localization • Colored points are features with high positive response for attribute classifiers. • Using whole features we may not obtain classifiers with the semantics we expect.

Conclusion • The first in computer vision to provide the abilities to describe objects and learn from description. • Cross-category generalization: select features that can predict the attribute within each class. • Generalization problem: expect a person detector trained on the INRIA dataset that works well on the PASCAL-08. The detector learns as much about dataset biases as about the objects themselves.

Further readings • Christoph H. Lampert Hannes Nickisch Stefan Harmeling, “Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer”, CVPR09. • Neeraj Kumar Alexander C. Berg Peter N. Belhumeur Shree K. Nayar, “Attribute and Simile Classifiers for Face Verification”, ICCV09.

CVPR 09 Paper Review