1 / 77

RadViz Extensions with Applications

RadViz Extensions with Applications. Dissertation Defense John Sharko October 26, 2009. Committee. Prof. Georges Grinstein (Advisor) Prof. Kenneth Marx Prof. Haim Levkowitz Dr. Patrick Hoffman Dr. Alex Gee. Outline. Introduction RadViz Cluster Ensembles Fuzzy Clusters Methodology

etta
Télécharger la présentation

RadViz Extensions with Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RadViz Extensions with Applications Dissertation Defense John Sharko October 26, 2009

  2. Committee • Prof. Georges Grinstein (Advisor) • Prof. Kenneth Marx • Prof. Haim Levkowitz • Dr. Patrick Hoffman • Dr. Alex Gee

  3. Outline • Introduction • RadViz • Cluster Ensembles • Fuzzy Clusters • Methodology • Contributions • Recommendations

  4. RadViz Example

  5. Description of Traditional RadViz Each dimension in a dataset is represented by a point, called an anchor point, on the circumference of a circle. Each record in the dataset is positioned as if it were being pulled by a spring attached to each anchor point where the strength of the spring is proportional to that record’s coordinate or value for the dimension related to that anchor point.

  6. RadViz ExampleAll Coordinate Values Equal

  7. RadViz ExampleTwo Coordinate Values Equal

  8. RadViz ExampleRange of Coordinates Values

  9. RadViz ExampleRange of Coordinates Values

  10. Terminology • Dimensional Anchor (Anchor Point) – point on the circle representing a dimension • Point • representation of record(s) within the circle

  11. RadViz Mathematical Formulation where: xi and yi are the resulting transformed coordinates for record i θj is the angular position on the circle corresponding to dimension j ai,j is the value for dimension j for record i d is the number of dimensions and n the number of records.

  12. Impact of Exchanging Dimensional Anchors A A B C (1, 0, 1, 0) D D C B

  13. Example of Repositioning Anchor Points Using Layout Algorithm Before repositioning After repositioning

  14. Multiple Clustered Datasets • Clustering algorithms are heuristic, not optimal • Different clustering algorithms tend to generate different clusters

  15. Sample Multiple Clustered Dataset

  16. Stable Group of Records

  17. Uniquely Clustered Record

  18. Fuzzy Clusters • A record belongs to multiple clusters • Varying strengths of association

  19. Cluster Ensemble vs. Fuzzy Clustering

  20. Using RadViz to Analyze Multiple Clustered Datasets • RadViz typically deals with real numbers • Cluster number just does not work • How do you produce a meaningful RadViz visualization?

  21. Flattening of Categorical Data • Break up each original dimension into multiple dimensions • Each new dimension represents a value of the original dimension

  22. Flattening a Dimension Flattened Manufacturer Model Small Large Sporty Van Price Original Manufacturer Model Type Price Original Record: (Cadillac, Deville, Large, 33) Flattened Record: (Cadillac, Deville, 0, 1, 0, 0, 33)

  23. Flattening Multi Cluster Dataset Flattened Dimensions 1 2 1 2 3 1 2 3 4 Original Dimensions Algorithm A Algorithm B Algorithm C { { A B C A B C { Sample Record: (2, 1, 4) (0, 1, 1, 0, 0, 0, 0, 0, 1 )

  24. Simple Example • Iris dataset • Three cluster sets • KM1: K-means clustering with 1000 iterations • KM2: K-means clustering with 100,000 iterations • HC: hierarchical clustering • Ten clusters per cluster set

  25. Flattened Multi-cluster Iris Dataset KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

  26. Flattened Multi-cluster Iris Dataset - Jittered KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

  27. Flattened Multi-cluster Iris Dataset KM1 Color Scale 10 9 8 7 6 5 4 3 2 1 HC-6

  28. Repositioning Dimensional Anchors • Move points away from the center • Separate points • Increase displayed information content

  29. Class Discrimination Layout Algorithm • Select a dimension that classifies the records • Assign each dimension to the class with the highest values with respect to the other classes • Move the dimensional anchors assigned to the same class next to each other to form a classification sector

  30. Example of Class Discrimination Layout Algorithm Before After Class Classification Sector 1 1 2 Classification Sector 2

  31. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  32. After Repositioning Dimensional Anchors KM1-2 KM1 Cluster Size 30 records 20 records 10 records 5 records

  33. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  34. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  35. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  36. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  37. After Repositioning Dimensional Anchors KM1 Cluster Size 30 records 20 records 10 records 5 records

  38. Moving Similar Classification Sectors Close to Each Other -Dimensions have been grouped together into classification sectors -Determine which record classes are most similar to each other using Euclidean distances -Move those dimension sectors closer to each other using greedy algorithm -Records will tend to be moved away from the center

  39. Repositioning Classification Sectors KM1 Color Scale KM2-5 KM1-7 KM1-3 HC-9 10 9 8 7 6 5 4 3 2 1 KM2-10 KM1-4 HC-5 7 HC-3 4 3 KM1-2 KM1-8 8 2 HC-6 HC-2 1 5 KM1-1 KM2-4 KM2-2 6 KM1-5 9 10 KM2-6 HC-8, HC-7 KM2-3, KM2-8 KM2-1, HC-10 KM2-9, KM1-6 HC-1 KM1-9 KM1-10 HC-4 KM2-7

  40. Interpreting Vectorized RadViz Petal length Setosa Virsicolor Virginica Sepal length

  41. Interpreting VRV Petal length Setosa Virsicolor Virginica Sepal length

  42. Interpreting VRV Petal length Setosa Virsicolor Virginica Sepal length

  43. Interpreting VRV Petal length Setosa Virsicolor Virginica Sepal length

  44. Interpreting VRV Petal length Setosa Virsicolor Virginica Sepal length

  45. Salamander Gene Expression Levels Expression Levels Time

  46. Salamander Class 9 Genes Nvg00226 Nvg00111 Nvg00155 Nvg00091

  47. Salamander Class 9 Genes • Nvg00111 • “Key” gene • CXC chemokine, ligand 10 • Nvg00226 • No homology • Nvg00155 • Keratin type II cytoskeletal • Nvg00091 • Annexin

  48. Fuzzy Clusters

  49. Description of Fuzzy Clusters • K-means clustering algorithm used • Four clusters • Applied to Iris dataset

More Related