1 / 61

Feature Extraction

Feature Extraction. 主講人:虞台文. Content. Principal Component Analysis (PCA) Factor Analysis Fisher’s Linear Discriminant Analysis Multiple Discriminant Analysis. Feature Extraction. Principal Component Analysis (PCA). Principle Component Analysis.

Télécharger la présentation

Feature Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Extraction 主講人:虞台文

  2. Content • Principal Component Analysis (PCA) • Factor Analysis • Fisher’s Linear Discriminant Analysis • Multiple Discriminant Analysis

  3. Feature Extraction Principal Component Analysis (PCA)

  4. Principle Component Analysis • It is a linear procedure to find the direction in input space where most of the energy of the input lies. • Feature Extraction • Dimension Reduction • It is also called the (discrete) Karhunen-Loève transform, or the Hotelling transform.

  5. x w wTx The Basis Concept Assume data x (random vector) has zero mean. PCA finds a unit vectorw to reflect the largest amount of variance of the data. That is, Demo

  6. Remark: C is symmetric and semipositive definite. The Method Covariance Matrix

  7. The Method maximize subject to The method of Lagrange multiplier: Define The extreme point, say, w* satisfies

  8. The Method maximize subject to Setting

  9. Discussion • Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. • They are called the principal components of C. • Their significance can be ordered according to their eigenvalues. At extreme points w is a eigenvector of C, and is its corresponding eigenvalue.

  10. Discussion • Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. • They are called the principal components of C. • Their significance can be ordered according to their eigenvalues. At extreme points • If C is symmetric and semipositive definite, all their eigenvectors are orthogonal. • They, hence, form a basis of the feature space. • For dimensionality reduction, only choose few of them.

  11. Applications • Image Processing • Signal Processing • Compression • Feature Extraction • Pattern Recognition

  12. Example Projecting the data onto the most significant axis will facilitate classification. This also achieves dimensionality reduction.

  13. Issues The most significant component obtained using PCA. • PCA is effective for identifying the multivariate signal distribution. • Hence, it is good for signal reconstruction. • But, it may be inappropriate for pattern classification. The most significant component for classification

  14. Whitening • Whitening is a process that transforms the random vector, say, x = (x1, x2 , …,xn)T (assumed it is zero mean) to, say, z = (z1, z2 , …,zn)T with zero mean and unit variance. • zis said to be white or sphered. • This implies that all of its elements are uncorrelated. • However, this doesn’t implies its elements are independent.

  15. Clearly, D is a diagonal matrix and E is an orthonormal matrix. Whitening Transform Let V be a whitening transform, then Decompose Cx as Set

  16. Whitening Transform If V is a whitening transform, and U is any orthonormal matrix, show that UV, i.e., rotation, is also a whitening transform. Proof)

  17. Why Whitening? • With PCA, we usually choose several major eigenvectors as the basis for representation. • This basis is efficient for reconstruction, but may be inappropriate for other applications, e.g., classification. • By whitening, we can rotate the basis to get more interesting features.

  18. Feature Extraction Factor Analysis

  19. What is a Factor? • If several variables correlate highly, they might measure aspects of a common underlying dimension. • These dimensions are called factors. • Factors are classification axis along which the measures can be plotted. • The greater the loading of variables on a factor, the more that factor can explain intercorrelations between those variables.

  20. Verbal Skill (F2) +1 +1 1 Quantitative Skill (F1) 1 Graph Representation

  21. What is Factor Analysis? • A method for investigating whether a number of variables of interestY1, Y2, …, Yn, are linearly related to a smaller number of unobservable factorsF1, F2, …, Fm. • For datareduction and summarization. • Statistical approach to analyze interrelationships among the large number of variables & to explain these variables in term of their common underlying dimensions (factors).

  22. What factors influence students’ grades? Quantitative skill? unobservable Example Verbal skill? Observable Data

  23. The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix

  24. The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix

  25. The Model Can be obtained from the model Can be estimated from data

  26. The Model Commuality Specific Variance Explained Unexplained

  27. Cy BBT + Q = Example

  28. Goal Our goal is to minimize Hence,

  29. Uniqueness Is the solution unique? There are infinite number of solutions. Since if B* is a solution and T is an orthonormal transformation (rotation), then BT is also a solution.

  30. Cy= Example Which one is better?

  31. i2 i2 i1 i1 Left:each factor have nonzero loading for all variables. Example Right:each factor controls different variables.

  32. The Method • Determine the first set of loadings using principal component method.

  33. Cy Example

  34. Factor Rotation Factor-Loading Matrix Rotation Matrix Factor Rotation:

  35. Factor Rotation • Varimax • Quartimax • Equimax • Orthomax • Oblimin Criteria: Factor-Loading Matrix Factor Rotation:

  36. . . . Criterion: Maxmize Varimax Subject to Let

  37. Criterion: Maxmize Varimax Subject to Construct the Lagrangian

  38. Varimax dk cjk bjk

  39. Varimax Define is the kth column of

  40. Varimax is the kth column of

  41. Varimax Goal: reaches maximum once

  42. Iteratively execute the following procedure: evaluate and You need information of B1. find and such that Next slide if stop Repeat Varimax Goal: • Initially, • obtain B0 by whatever method, e.g., PCA. • set T0 as the approximation rotation matrix, e.g., T0=I.

  43. Initially, • obtain B0 by whatever method, e.g., PCA. • set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find and such that Next slide if stop Repeat Varimax Goal: Pre-multiplying each side by its transpose.

  44. . . . Varimax Criterion: Maximize

  45. Maximize Varimax Let

  46. Feature Extraction Fisher’s Linear Discriminant Analysis

  47. Main Concept • PCA seeks directions that are efficient for representation. • Discriminant analysis seeks directions that are efficient for discrimination.

  48. Classification Efficiencies on Projections

  49. m m 1 2 Criterion  Two-Category ||w|| = 1 w

  50. m m 1 2 Between-Class Scatter Matrix Scatter ||w|| = 1 w Between-Class Scatter The larger the better

More Related