1 / 13

RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

Lionel F. Lovett, II Advisors: George Ostrouchov and Houssain Kettani Computer Science and Mathematics Division Oak Ridge National Laboratory. Summer 2005. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering.

delongm
Télécharger la présentation

RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lionel F. Lovett, II Advisors: George Ostrouchov and Houssain Kettani Computer Science and Mathematics Division Oak Ridge National Laboratory Summer 2005 RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering

  2. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Data Mining Large Databases Number of Items Number of Attributes (high-dimensionality) with items Visualization Requires low dimensional views (2 or 3) Structure discovery Patterns Clusters Fast similarity searching Images, video, documents, character recognition, face recognition, DNA sequences Data Reduction Why Dimension Reduction?

  3. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering RobustMap Uses Distances and Mimics PCA Like FastMap Faloutsos and Lin (1995) (FastMap) • Choose two very distant points as principal axis • Project onto orthogonal hyperplane • Repeat • Each axis O(n), given distances • Distances updated using cosine law as needed • Result is a mapping as well as the transformation to map new items

  4. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Projection to Pivot Axis and to Orthogonal Hyperplane Given pivot axis ab FastMap computes coordinates along axis and projections onto the orthogonal hyperplane. b b db,y z da,b y y cy a da,y y’ a z’ dy’,z’

  5. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering OUTLIERS! Outliers are points that are not closest on average to the other members of their cluster. When Selecting points based on distance, FastMap considers all the points of a dataset. By including outliers, FastMap isn’t robust. Problems with FastMap?

  6. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering FastMap Pivot Pair: Choosing Outliers Axis does not represent majority of data

  7. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering RobustMap: Clustering and Excluding Outliers Ratio Function • Uses only distances from pivots (2nd and 3rd) • Computes ratios: data fraction / probability of data • Looks for splits according to ratio threshold • Discards smaller portion. • Compute n distances from random object • Take point of largest distance • Repeat • Clustering • Estimate distance distribution from two extreme points • Find probability of extreme points • Exclude most extreme cluster of low probability points • Finish projection using remaining points • Diagnostic histogram and cluster plots

  8. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Dataset Generator • Generates clustered data from a mixture of multivariate normal densities • There are five parameters • Number of dimensions • Number of clusters • Cluster variability • Cluster mixing proportions • Seed for random number generator • Other RobustMap parameters • Number of dimensions to extract • Quantile of trimmed max • Ratio threshold for outlying cluster extraction

  9. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering RobustMap identifies and excludes outlying clusters RobustMap performs dimension reduction RobustMap exploits robust statistics RobustMap exploits fast machine learning algorithms (runtime O(nk)) Results

  10. Decomposition of Climate Model Run Datawith PCA (EOF) Amplitude-in-time plots Amplitude-in-time plots RobustMap 135 year CCM3 run at T42 resolution CO2 increase to 3x Average Monthly Surface Temperature 1620 x 2500 matrix (Putman, Drake, Ostrouchov, 2000) PC 1 RM 1 PC 2 RM 2 . . . + + + 1000 x FASTER ! + + + PC 1 PC 2 PC 3 PC 4 PC 1620 RM 1 RM 2 RM 3 RM 4 PC 3 RM 3 Concise 4-d summary of 135 year run Concise 4-d summary of 135 year run PC 4 RM 4 Winter warming more severe than summer warming Winter warming more severe than summer warming Image vector

  11. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Ratio Compute threshold from probability theory Create loop for remaining clusters Develop better probability theory for RobustMap Add application context visualization Future Plans

  12. RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Searching Images and multimedia databases String databases (spelling, typing and OCR error correction) Medical databases Data Mining and Visualization Medical databases (ECGs, X-rays, MRI brain scans) Demographic Data Time Series Business, Commerce, and Financial Data Climate, Astrophysics, Chemistry, and Biology Data Applications

  13. Questions?

More Related