CVPR 09 Paper Review

CVPR 09 Paper Review 讲解人：阚美娜日期：2009-9-18

文章列表 • 速读文章 • Symmetric Two Dimensional Linear Discriminant Analysis(2D LDA), Dijun Luo, Chris Ding, Heng Huang(#1421) • Bundling Features for Large Scale Partial-Duplicate Web Image Search, Zhong Wu, Qifa Ke, Michael Isard , Jian Sun(#1780) • 精读文章 • Unsupervised Maximum Margin Feature Selection with Manifold Regularization, Bin Zhao, James Kwok, Fei Wang, Changshui Zhang(#1189)

Paper #1 • Title • Symmetric Two Dimensional Linear Discriminant Analysis(2D LDA) • Author • Dijun Luo, Chris Ding, Heng Huang • Affiliation • University of Texas at Arlington

摘要 • 传统的LDA是将高维的物体(eg:图像)转换成1维的向量进行处理，近年来提出的2D LDA不需要将2维的图像矩阵展成向量，而是直接在2维的矩阵上进行降维等处理。但是2DLDA的目标函数一直没有被解决 • 所以本文 • 提出了对称LDA的形式，解决了2DLDA目标函数的ambiguity • 给出了一种有效的求解这种对称2DLDA目标函数的方法 • 在UMIST ，CMU PIE，Yale B上的结果都说明我们的方法在分类性能和目标函数的结果上都优于其他的2DLDA的方法

Classic LDA(1D LDA) • Target: • 将高维数据转换到低维的空间中，分类任务更容易进行 • Transformation • ，是样本集合 • Objective function：不存在歧义！

2D LDA • Transformation： • Ambiguity • 图像是不对称的会产生很多个不同的目标函数！

2D LDA • 各种不同的目标函数 Which one？

Symmetric Bilinear Transformation • Motivation • 如果图像是对称的，那么上述歧义就不存在了 • Symmetric Bilinear Transformation 还是成立的，因此优化(L,R)等价于优化

Symmetric Bilinear Transformation 2D LDA的目标函数：

Symmetric Bilinear Transformation 对称数据表示目标函数消除歧义

Symmetric 2D LDA的求解 • 传统的求解 • 两个变量(L,R)分别优化，存在不一致性 • Symmetric 2D LDA • 同时优化，不存在不一致性问题 • 用梯度上升法求解

Symmetric 2D LDA的求解

实验结果--UMNIST PCA 2D SVD Ye’s 2D LDA Our

实验结果--UMNIST 将图像分为k 组， 1组做训练，k-1组做测试

CMU PIE Yale B

Conclusion • 解决了2D LDA中目标函数的歧义问题 • 提出了一种有效的求解方法 • 2D LDA一种双线性判别分析方法，基于Tensor的LDA方法都可以使用本文中的文法。 • Future work：multi-linear discriminant analysis

Paper #2 • Title • Bundling Features for Large Scale Partial-Duplicate Web Image Search • Author • Zhong Wu, Qifa Ke, Michael Isard, Jian Sun • Affiliation • Microsoft Research

摘要 • 目前的图像检索系统中，一幅图像是用对高维的局部描述子离散化得到bag of visual words表示的，对于大规模的图像索引和检索使用可扩展的方法。但是bag of visual words的表示方法 • 特征离散化降低了特征的判别能力 • 忽略了visual words之间的几何关系 • 这种几何关系可以很大程度的提高性能，但是计算代价很高。本文提出一种新的方法 • 图像特征被bundle为local groups，判别能力比单个的特征强 • 每个group中使用简单、鲁棒的几何限制 • 在1,000,000+的数据集上的web图像检索，我们的方法比baseline的方法提高了49%，和完全几何的方法comparable，在确认的实验上，比baseline的方法提高了77%，比完全几何的方法提高了24%

相关工作 • 目前图像检索中最流行的两类特征 • SIFT • keypoint + SIFT descriptor from the region centered at the keypoint • MSER(Maximally Stable Extremal Region) • Affine-covariant stable region + SIFT from the region

Bundled Feature • SIFT Feature: • MSE Region: • Bundled Feature: • 即：用MSE Region将多个Feature “Bundle”到一起 • 可以进行部分匹配(partially match，subset match) • Robust to occlusion

Bundled Feature

Matching Bundling Features • 对任意的两个bundled feature(已经离散化为visual words) • Matching score: • Membership term: # of common visual words • Geometric term: p中与q中相对顺序不同的对，实验中用的是X-Y坐标

Matching Bundling Features 左边的按几何位置排序：1<2<3<4 左边的按几何位置排序：1<2<3<4 右边期望的顺序：1<2<4<5 右边期望的顺序：5<2<1<4 Geometric term =0 Geometric term =-2

实验结果 • 数据集 • 1,000,000张图像 • 人工标注780张，19类图像作为ground truth • 评价 • Mean average precision(mAP) • Baseline: bag of feature, soft quantization = 4,ehanced with hamming code(HE)

Query Image Baseline approach Our approach

Paper #3 • Title • Unsupervised Maximum Margin Feature Selection with Manifold Regularization • Author • Bin Zhao1, James Kwok1 2, Fei Wang1, Changshui Zhang1 • Affiliation • Tsinghua University 1 • HKUST 2

作者介绍-1 • Bin Zhao • Advisor: Changshui Zhang • Education • B.S. :Tsinghua University. Bachelor of Engineering in Automation,2002-2006 • M.S.:Tsinghua University. Candidate for Master of Engineering degree in Machine Learning,2006-2009 • Publication • Bin Zhao, James Kwok, Fei Wang, Changshui Zhang. Unsupervised maximum margin feature selection with manifold regularization. (CVPR 09) • Bin Zhao, James Kwok, Changshui Zhang. Multiple Kernel Clustering. The 9th SIAM International Conference on Data Mining (SDM 09). • Bin Zhao, Fei Wang, Changshui Zhang. Block Quantized Support Vector Ordinal Regression. IEEE Transactions on Neural Networks (TNN09).

作者介绍-1 • Bin Zhao, Fei Wang, Changshui Zhang. Maximum Margin Embedding. Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 08), Pisa, Italy. To Appear. 2008. • Bin Zhao, Fei Wang, Changshui Zhang. CutS3VM: A Fast Semi-Supervised SVM Algorithm. The 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 08). Las Vegas, Nevada. 2008. pp. 830-838. • Bin Zhao, Fei Wang, Changshui Zhang. Efficient Multiclass Maximum Margin Clustering. The 25th International Conference on Machine Learning (ICML 08). Helsinki, Finland. 2008. pp. 1248-1255. • Bin Zhao, Fei Wang, Changshui Zhang. Efficient Maximum Margin Clustering Via Cutting Plane Algorithm. The 8th SIAM International Conference on Data Mining (SDM 08). Hyatt Regency Hotel, Atlanta, Georgia. 2008. pp. 751-762. (ORAL) • Bin Zhao, Fei Wang, Changshui Zhang, Yangqiu Song. Active Model Selection for Graph Based Semi-Supervised Learning. The 33rd International Conference on Acoustics, Speech, and Signal Processing (ICASSP 08), Las Vegas, Nevada. 2008. pp. 1881-1884.(ORAL) 2008 • Bin Zhao, Fei Wang, Changshui Zhang. Smoothness Maximization via Gradient Descents. The 32nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP 07), Honolulu, Hawaii, 2007. pp. II-609-II-612.

作者介绍-2 • James Kwok(郭天佑) • Biography • B.Sc., University of Hong Kong • Ph.D., Hong Kong University of Science and Technology • Lucent Technologies Bell Lab • Hong Kong Baptist University as Assistant Professor • Hong Kong University of Science and Technology,2000,as Associate Professor • Associate Editor for TNN and Neurocomputing Journal • Research Interest • Kernel Methods • Machine Learning • Pattern Recognition • Artificial Neural networks

作者介绍-2 • Award • TNN Outstanding 2004 Paper Award • Group • Artificial Intelligence Group @ HKUST • http://www.cse.ust.hk/aigroup

作者介绍-3 • Fei Wang • Education • B.Sc.: Xi dian University,2003 • Ph.D: Tsinghua University,2003-2008 • Postdoctoral Researcher@Florida International University,2008~Now • Publication • Fei Wang, Changshui Zhang, Tao Li. Clustering with Local and Global Regularization. (TKDE)09. • Fei Wang, Xin Wang. Neighborhood Discriminative Tensor Mapping. Neurocomputing.09. • Fei Wang, Changshui Zhang. Label Propagation Through Linear Neighborhoods. (TKDE)08 • Fei Wang, Changshui Zhang. Semi-supervised Learning Based on Generalized Point Charge Models.(TNN)08. • IEEE Transactions on Knowledge and Data Engineering(TKDE).08.

作者介绍-3 • Fei Wang, Xin Wang, Tao Li.Generalized Cluster Aggregation.(IJCAI 09 Oral ) • Fei Wang, Bin Zhang, Ta-Hsin Li, Wen jun Yin, Jin Dong, Tao Li. Preference Learning with Extreme Examples.(IJCAI 09 Oral ). • Fei Wang, Shijun Wang, Changshui Zhang, Ole Winther. Semi-Supervised Mean Fields. (AISTATS 07 ). • Fei Wang, Changshui Zhang. On Discriminative Semi-supervised Classification.(AAAI 08 Oral ). • Fei Wang, Tao Li, Gang Wang, Changshui Zhang. Semi-supervised Classification Using Local and Global Regularization. (AAAI 08 Oral ). • Fei Wang, Changshui Zhang. Label Propagation Through Linear Neighborhoods. (ICML 06 Oral ).

作者介绍-4 • Changshui Zhang(张长水) • 清华大学自动化系教授 • 研究兴趣 • 模式识别，机器学习，人工智能，计算机视觉，图像处理，进化计算，复杂网络 • 研究课题 • 指纹识别，人脸识别，生物特征识别，工业线路板检测，BBS回文网络分析

Abstract • Feature selection plays a fundamental role in many pattern recognition problems. However, most efforts have been focused on the supervised scenario, while unsupervised feature selection remains as a rarely touched research topic. • In this paper, we propose Manifold-Based Maximum Margin Feature Selection (M3FS) to select the most discriminative features for clustering. • M3FS targets to find those features that would result in the maximal separation of different clusters and incorporates manifold information by enforcing smoothness constraint on the clustering function. • Specifically, we define scale factor for each feature to measure its relevance to clustering, and irrelevant features are identified by assigning zero weights. • Feature selection is then achieved by the sparsity constraints on scale factors. Computationally, M3FS is formulated as an integer programming problem and we propose a cutting plane algorithm to efficiently solve it. Experimental results on both toy and real-world data sets demonstrate its effectiveness.

摘要 • 特征选择在很多模式识别问题中具有重要的作用。但是，大部分工作都关注于有监督的场景，无监督的特征选择是少有人涉及到的研究题目。 • 本文，我们提出(Manifold Based Maximum Margin Feature Selection)基于流形的最大化Margin的特征选择方法M3FS。 • M3FS是要寻找那些能够使得不同的cluster间隔最大的特征，并且通过对聚类函数施加平滑性约束来嵌入流形信息。具体的，我们给每个特征定义一个scale factor，用于度量特征和聚类之间的相关性，那些权重为零的特征就是不相关特征。 • 特征选择是通过对scale factor施加稀疏性约束得到的。从计算上来说，M3FS可以表示为一个整数优化问题，我们提出一种cutting plane算法来进行有效的求解。在toy和real-word data sets上的实验表明是非常有效的。

提纲相关工作 MMC 两类M3FS 多类M3FS 实验结果

特征选择相关工作 • 特征选择的作用 • 提高分类器的泛化能力 • 减少计算开销，加快测试 • 避免收集大量无关或冗余特征 • 更少的特征得到的模型，更容易被理解 • 特征选择的方法 • Filter • 利用独立于后端分类器的某种准则对每个特征进行评分 • 简单，速度快，性能低 • Wrapper • 利用后端分类器的预测性能对特征或候选特征子集进行打分 • 性能高，速度慢 • Embedded • 分类器的训练和特征选择是融合到一起 • More efficient，具体的方法设计是和分类器紧密相关的

特征选择相关工作 • 无监督的特征选择方法 • Filter，Wrapper，Embedded的方法，大都是基于generative model • 产生式的模型的假设与数据不吻合时，性能会下降，而且在有监督的学习中，通常也认为判别式模型要好于产生式模型。在判别式模型中，基于Margin的方法(eg: SVM)应用的非常成功。 • 本文基于Xu et al. NIPS 2005的Maximum Margin Clustering(MMC)提出一种基于流形的MMC特征选择方法。

提纲相关工作 MMC 两类M3FS 多类M3FS 实验结果

Maximum Margin Clustering • MMC是将SVM扩展到(无监督的)聚类上的一种方法。在无监督学习中类的标号(Class Label)是未知的，因此MMC是要找到一种类别标记，和一个超平面分类器，从而使得在所有可能的类别标记中，MMC得到的Margin最大的。 • 以两类为例：

MMC-两类问题 • 给定一组样本：，MMC的目标是寻找一组最优的标号，使得在上训练的SVM具有最大的Margin。可以形式化为求解下面的问题： • 最后一项是class balance constraint，可以避免平凡解， l是控制类不平衡的常数

提纲相关工作 MMC 两类M3FS 多类M3FS 实验

Unsupervised Maximum Margin Feature Selection with Manifold Regularization • 两类问题 • Goals 1： • 找到一个特征子集，使聚类得到的cluster间隔最大 • Goals 2： • 在特征选择的过程中利用流形信息

M3FS-两类问题 • Goal 1： • 找到一个特征子集，使聚类得到的cluster间隔最大 • Step： • 扩展MMC，给每个特征k关联一个可学习的比例因子，用于度量特征和聚类之间的相关性 • 当学习过程结束，比例因子为0的特征被认为是无关特征。 • 最终的判别函数是：

M3FS-两类问题 • Goal 2： • 在特征选择的过程中利用流形信息 • 实现： • 使决策函数f(x) 在整体数据流形上是平滑的，通过添加一个流形正则项来实现。

M3FS-两类问题 • M3FS可以表示成：特征选择流形正则项

M3FS-两类问题

CVPR 09 Paper Review