1 / 17

Enhancing Dimensionality Reduction with Linear Transformations: A Revised Approach

This project update by Mingyue Tan discusses advancements in dimensionality reduction techniques through the use of linear transformations. It focuses on key questions regarding cluster shapes, densities, outliers, and the impact of data coordinates on cluster decomposition. The traditional method of Principal Component Analysis (PCA) is explored, highlighting its limitations such as sensitivity to outliers and inability to reveal clustering structures. A new approach, Weighted PCA, is proposed to better manage outliers and emphasize clustering, setting the foundation for further innovations in data visualization.

myron
Télécharger la présentation

Enhancing Dimensionality Reduction with Linear Transformations: A Revised Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimensionality Reduction with Linear Transformationsproject update by Mingyue Tan March 17, 2004

  2. Questions to answer - What’s the shape of the clusters? - Which clusters are dense/heterogeneous? - Which data coordinates account for the decomposition to clusters? - Which data points are outliers? Domain and Task Data are labeled

  3. Solution - Dimension Reduction 1. Project the high-dimensional points in a low dimensional space while preserving the “essence” of the data - i.e. distances are preserved as well as possible 2. Solve the problems in low dimensions Dimensionality reduction

  4. Principal Component Analysis • Intuition: find the axis that shows the greatest variation, and project all points into this axis f2 e1 e2 f1

  5. Problem with PCA • Not robust - sensitive to outliers • Usually does not show clustering structure

  6. PCA - seeks a projection that maximizes the sum Weighted PCA - seeks a projection that maximizes the weighted sum - flexibility New Approach Bigger wij -> More important to put them apart

  7. Weighted PCA Varying wijgives: • Weights specified by user • Normalized PCA – robust towards outliers • Supervised PCA – shows cluster structures - Ifiandjbelong to the same cluster  setwij=0 - Maximize inter-cluster scatter

  8. Comparison – with outliers - PCA: Outliers typically govern the projection direction

  9. Comparison – cluster structure - Projections that maximize scatter ≠ Projections that separate clusters

  10. Summary

  11. Interface

  12. Interface - File

  13. Interface - task

  14. Interface - method

  15. Interface

  16. Milestones • Dataset Assembled - same dataset used in the paper • Get familiar with NetBeans - implemented preliminary interface (no functionality) • Rewrite PCA in Java (from an existing Matlab implementation) – partially done • Implement four new methods

  17. Reference • [1] Y. Koren and L. Carmel, “Visualization of Labeled Data Using Linear Transformations", Proc. IEEE Information Visualization (InfoVis?3), IEEE, pp.121-128, 2003.

More Related