Adaptive Graph Construction and Dimensionality Reduction

Adaptive Graph Construction and Dimensionality Reduction Songcan Chen, Lishan Qiao, Limei Zhang http://parnec.nuaa.edu.cn/ {s.chen, qiaolishan, zhanglimei}@nuaa.edu.cn 2009. 11. 06

Outline • Why to construct graph? • Typical graph construction • Review & Challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Graph is used to characterize data geometry (e.g., manifold) and thus plays an important role in data analysis including machine learning! For example, dimensionality reduction, semi-supervised learning, spectral clustering, … Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Dimensionality reduction 10 -10 4-NN Graph • Nonlinear manifold learning • E.g., Laplacian Eigenmaps, LLE, ISOMAP 2D Embedding Result Data Points (Swiss roll) • Linearized variants • E.g., LPP, NPE, and so on • (Semi-)supervised and/or Tensorized extensions • Too numerous to mention one by one Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Dimensionality reduction [1] PCA LDA • Many classical DR algorithms • E.g., PCA (Unsupervised), LDA (Supervised) According to [1], most of the current dimensionality reduction algorithms can be unified under a graph embedding framework. [1] S.Yan, D.Xu, B.Zhang, H.Zhang, Q.Yang, S.Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29(1)(2007):40–51. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Semi-supervised learning Transductive (e.g., Label Propagation) Inductive (e.g., Manifold Reg.) Data Points with 4-NN graph • Typical graph-based semi-supervised algorithms • Local and global consistency • Label propagation • Manifold regularization • … “Graph is at the heart of the graph-based semi-supervised learning methods” [1]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Spectral clustering Clustering structure Manifold structure • Typical graph-based clustering algorithms • Graph cut • Normalized cut • … “Ncut on a kNN graph does something systematically different than Ncut on an ε-neighborhood graph! … shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to.”[1] [1] M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures. NIPS, 2008 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Summary • Dimensionality reduction • Linear/nonlinear, local/nonlocal, parametric/nonparametric • Semi-supervised learning • Transductive/inductive • Spectral clustering • Clustering structure/manifold structure A well-designed graph tends to result in good performance [1]. How to construct a good graph? What is the right graph for a given data set? [1] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Summary Generally speaking, Despite its importance, “Graph construction has not been studied extensively” [1]. “The way to establish high-quality graphs is still an open problem”[2]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. [2] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Why to construct graph? Summary • Fortunately, graph construction problem has attracted • increasingly attention, especially in this year (2009) • For example, graph construction by • sparse representation [1,2,3] or l1-graph. • minimizing the weighted sum of the squared distance from each vertex to the weighted average of its neighbors [4]. • b-matching graph [5] • symmetry-favored criterion and assuming that the graph is doubly stochastic [6]. • learning projection transform and graph weights simultaneously [7]. [1] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recogn, 2009 (Received on 21 July 2008) [2] S. Yan,H. Wang, Semi-supervised Learning by Sparse Representation. SDM, 2009 [3] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR, 2009. [4] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 [5] T. Jebara, J. Wang, S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning. ICML, 2009. [6] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009 [7] L. Qiao, S. Chen, L. Zhang, A Simultaneous Learning Framework for Dimensionality Reduction and Graph Construction, submitted, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Review Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… A basic flow for graph-based machine learning Two basic characteristics • Task-independent • Two steps • Graph construction • Edge weight assignment Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Review • Two basic criteria • k-nearest neighbor criterion (Left) • ε-ball neighborhood graph (Right) Graph construction Edge weight assignment Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Review • Gaussian function (Heat kernel) • Inverse Euclidean distance • Local reconstructive relationship (involved in LLE) Graph construction Edge weight assignment • Several basic ways Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Review • Several basic ways • Gaussian function (Heat kernel) • Inverse Euclidean distance • Local reconstructive relationship (involved in LLE) • Non-negative local reconstruction [1] [1] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. NIPS, 2006 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Challenges • Few degree of freedom • Little noise • Sufficient sampling (Abundant samples) • Smooth assumption or clustering assumption However, • work well only when conditions are strictly satisfied. • In Practice, >>??  Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Challenges ①~⑤ • Tens or hundreds of degrees of freedom • Recent research [1] showed the face subspace is estimated to have at least 100 dimensions. • More complex composite objects ?  1 Noise and other corruptions 2 Euclidean Distance The locality preserving criterion may not work well under this scenario, especially when just few training samples are available. 0.84x103 0.92x103 1.90x103 [1] M. Meytlis, L. Sirovich, On the dimensionality of face space. IEEE TPAMI, 2007, 29(7): 1262-1267. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Challenges ①~⑤ Insufficient samples 3 Data points kNN graph Data points kNN graph Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Challenges ①~⑤ Also, this illustrate… In fact, there are not reliable methods to assign appropriate values for the parameters k and ε under unsupervised scenario, or if only few labeled samples are available [1]. The sensitivity to neighborhood size 4 Another example, on Wine data set 15 samples per class for training 5 samples per class for training [1] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf, Learning with local and global consistency. NIPS, 2004 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Typical graph construction Challenges ①~⑤ • Others. For example, • The lingering “curse of dimensionality” • Fixed neighborhood size • Independence on subsequent learning tasks 5 Dimensionality reduction aims mainly at overcoming the “curse of dimensionality”, but unfortunately locality preserving algorithms construct graph relying on the nearest neighbor criterion which itself suffers from such a curse. This seems to be a paradox. Let’s try to address these problems… Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Outline • Why to construct graph? • Typical graph construction • Review & challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Task-independent graph construction Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning Our work (II) …… Our work (I) Our work (I) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Motivation PCA Simple, but ignore local structure LLE Consider locality, but fixed neighborhood size, artificial definition, difficult Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) From L0 to L1 The solution of L2 minimization (Left) and L1 minimization (Right) problem If the solution sought is sparse enough, the solution of L0-minimization problem is equal to the solution of L1-minimization problem [1]. [1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52(4) (2006) 1289-1306 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Modeling & Algorithms Nonsmooth optimization 1 Subgradient-based algorithms [1] Also, p=2, it can be recast as SOCP Quasi LASSO p=2, LASSO, many algorithms: LARS…[2] p=1, Linear Programming (see next page) 2 L1-ball constraint optimization [3] (e.g., SLEP: Sparse Learning with Efficient Projections, http://www.public.asu.edu/~jye02/Software/SLEP/index.htm) 3 [1] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publisher, 2003. [2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of Statistics, 2004, 32(2): 407-451. [3] J. Liu, J. Ye, Efficient Euclidean Projections in Linear Time, ICML, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Modeling (Example, p=1) (Left) A sub-block of the weight matrix constructed by the above model; (Right) The optimal t for 3 different samples (YaleB). Incorporate prior into the graph construction process ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Modeling (Example, p=2) L1-norm neighborhood and its weights Sparse, Adaptive, Discriminative, Outlier-insensitive Conventional k neighborhood and its weights Put samples from different classes into one patch [1]X.Tan, L.Qiao, W.Gao and J.Liu. Robust Faces Manifold Modeling: Most Expressive Vs. Most Sparse Criterion, Subspace 2009 Workshop in conjunction with ICCV2009, Kyoto, Japan Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) SPP: Sparsity Preserving Projections • The optimal describes the sparse reconstructive relationship. • So, we expect to preserve such relationship in the low dimensional space. • More specifically, Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Experiments: Toy PCA The toy data and their 1D images based on 4 different DRs algorithms LPP NPE SPP Insufficient sampling Additional prior Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Experiments: Wine Wine data set from UCI, 178 samples, 3 classes, 13 features The basic statistics of Wine data set PCA LPP NPE SPP The 2D projections of Wine data set Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Experiments: Face YALE AR Extended YALE B Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Experiments: Face AR_Fixed Yale AR_Random Extended YaleB Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Experiments: Face Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Related works [1] [2] [3] Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… Our work (I) Other extensions ? From graph to data-dependent regularization, … [1] L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recognition, 2009. (Received 21 July 2008) [2] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR2009. [3] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. SDM2009. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Extensions Semi-supervised classification Semi-supervised dimensionality reduction Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Extensions • Apply to single labeled face recognition problem • Compare with supervised LDA, unsupervised SPP, semi-supervised SDA SPDA: Sparsity Preserving Discriminant Analysis E1: 1 labeled and 2 unlabeled samples E2: 1 labeled and 30 unlabeled samples Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (I) Summary  • Adaptive “neighborhood” size; • Simpler parameter selection; • Less training samples; • Easier incorporation of prior knowledge ( Not so insensitive to noise) • Stronger discriminating power  • Higher computational cost Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Task-dependent graph construction Our work (II) Our work (I) Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… • Task-independent graph construction • Advantage: be applicable to any graph-based learning tasks • Disadvantage: does not necessarily help subsequent learning tasks Can we unify them? How to unify them? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Motivation (Cont’d) Furthermore, take LPP as an example, • Step 1: Graph construction k-nearest neighbor criterion • Step 2: Edge weight assignment • Step 3: Projection directions learning Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Motivation (Cont’d) 5 samples per class for training 15 samples per class for training • In LPP, “local geometry” is completely determined by the artificiallypre-fixed neighborhood graph. • As a result, its performance may drop seriously if given a “bad” graph. Unfortunately, it is generally uneasy to justify in advance whether a graph is good or not, especially under unsupervised scenario. • So, we expect the graph to be adjustable. How to adjust? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Motivation (Cont’d) A natural question: Can we obtain more locality perserving power or discriminating power by minimizing the objective function further ? A Key: how to characterize such a power formally! • LPP seeks a low-dimensional representation aiming at preserving the local geometry in the original data. • Locality preserving power is potentially related to discriminating power [1]. • Locality preserving power is described by minimizing its objective function. Our idea: optimize graph and learn projections simultaneously in a unified objective function. [1] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Modeling: SLPP regard graph Sij as new optimization variable, i.e., graph is adjustable instead of pre-fixed. Also, note we do not constrain Sij asymmetrical. 1 m (>1), a new parameter which controls the uncertainty of Sij and helps us obtain closed-form solution. In addition, without it, we will get a singular solution where only one element in each row of is 1 and other elements are all zeros. 2 new constraints, aim to avoid degenerate solution, provide a natural probability explanation for the graph. 3 remove dii from this constraint mainly for making the optimization tractable. 4 LPP Soft LPP (SLPP or SLAPP) 2 1 3 4 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Algorithm • Non-convex with respect to • Solve it by alternating iteration optimization technique • Fortunately, we will obtain closed- form solution at each step. • Step 1: Calculate W by generalized eigen-problem • Step 2: Update graph Normalized inverse Euclidean distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Algorithm Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Modeling: ELPP ELPP: Etropy-regularized LPP Normalized heat kernel distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) ELPP: Algorithm Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Convergence Cauchy’s convergence rule. Block-Coordinate Gradient Descent ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Our work (II) Experiments: Wine LPP SLPP(1) SLPP(3) SLPP(5) SLPP(7) SLPP(9) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11

Adaptive Graph Construction and Dimensionality Reduction

Adaptive Graph Construction and Dimensionality Reduction

Presentation Transcript

Graph Embedding: A General Framework for Dimensionality Reduction

Dimensionality reduction

Dimensionality Reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Dimensionality reduction

Dimensionality reduction

Dimensionality Reduction

Dimensionality reduction

Dimensionality Reduction and Embeddings

Dimensionality reduction