Support Vector Machines: Kernel Embedding Visualization

Object Orie’d Data Analysis, Last Time • Kernel Embedding • Embed data in higher dimensional manifold • Gives greater flexibility to linear methods • Support Vector Machines • Aimed at very non-Gaussian Data • E.g. from Kernel Embedding • Distance Weighted Discrimination • HDLSS Improvement of SVM

Support Vector Machines Graphical View, using Toy Example: • Find separating plane • To maximize distances from data to plane • In particular smallest distance • Data points closest are called support vectors • Gap between is called margin

Support Vector Machines Graphical View, using Toy Example:

Support Vector Machines Forgotten last time, Important Extension: Multi-Class SVMs Hsu & Lin (2002) Lee, Lin, & Wahba (2002) • Defined for “implicit” version • “Direction Based” variation???

Support Vector Machines Also forgotten last time, Toy examples illustrating Explicit vs. Implicit Kernel Embedding As well as effect of window width, σ on Gaussian kernel embedding

SVMs, Comput’n & Embedding For an “Embedding Map”, e.g. Explicit Embedding: Maximize: Get classification function: • Straightforward application of embedding • But loses inner product advantage

SVMs, Comput’n & Embedding Implicit Embedding: Maximize: Get classification function: • Still defined only via inner products • Retains optimization advantage • Thus used very commonly • Comparison to explicit embedding? • Which is “better”???

Support Vector Machines Target Toy Data set:

Support Vector Machines Explicit Embedding, window σ = 0.1:

Support Vector Machines Explicit Embedding, window σ = 1:

Support Vector Machines Notes on Explicit Embedding: • Too small  Poor generalizability • Too big  miss important regions • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)

Support Vector Machines Kernel space projection, window σ = 0.1:

Support Vector Machines Kernel space projection, window σ = 1:

Support Vector Machines Notes on Kernel space projection: • Too small  • Great separation • But recall, poor generalizability • Too big  no longer separable • As above: • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)

Support Vector Machines Implicit Embedding, window σ = 0.1:

Support Vector Machines Implicit Embedding, window σ = 0.5:

Support Vector Machines Implicit Embedding, window σ = 1:

Support Vector Machines Implicit Embedding, window σ = 10:

Support Vector Machines Notes on Implicit Embedding: • Similar Large vs. Small lessons • Range of “reasonable results” Seems to be smaller (note different range of windows) • Much different “edge” behavior Interesting topic for future work…

Distance Weighted Discrim’n 2-d Visualization: Pushes Plane Away From Data All Points Have Some Influence

Distance Weighted Discrim’n References for more on DWD: • Current paper: Marron, Todd and Ahn (2007) • Links to more papers: Ahn (2007) • JAVA Implementation of DWD: caBIG (2006) • SDPT3 Software: Toh (2007)

Batch and Source Adjustment Recall from Class Notes 8/28/07 • For Stanford Breast Cancer Data (C. Perou) • Analysis in Benito, et al (2004) Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/ • Adjust for Source Effects • Different sources of mRNA • Adjust for Batch Effects • Arrays fabricated at different times

Source Batch Adj: Biological Class Col. & Symbols

Source Batch Adj: Source Colors

Source Batch Adj: PC 1-3 & DWD direction

Source Batch Adj: DWD Source Adjustment

Source Batch Adj: Source Adj’d, PCA view

Source Batch Adj: S. & B Adj’d, Adj’d PCA

Why not adjust using SVM? • Major Problem: Proj’d Distrib’al Shape Triangular Dist’ns (opposite skewed) • Does not allow sensible rigid shift

Why not adjust using SVM? • Nicely Fixed by DWD • Projected Dist’ns near Gaussian • Sensible to shift

Why not adjust by means? • DWD is complicated: value added? • Xuxin Liu example… • Key is sizes of biological subtypes • Differing ratio trips up mean • But DWD more robust (although still not perfect)

Why not adjust by means? Next time: Work in before and after, slides like 138-141 from DWDnormPreso.ppt In Research/Bioinf/caBIG

Twiddle ratios of subtypes

Why not adjust by means? DWD robust against non-proportional subtypes… Mathematical Statistical Question: Are there mathematics behind this? (will answer next time…)

DWD in Face Recognition • Face Images as Data (with M. Benito & D. Peña) • Male – Female Difference? • Discrimination Rule? • Represented as long vector of pixel gray levels • Registration is critical

DWD in Face Recognition, (cont.) • Registered Data • Shifts and scale • Manually chosen • To align eyes and mouth • Still large variation • See males vs. females???

DWD in Face Recognition , (cont.) • DWD Direction • Good separation • Images “make sense” • Garbage at ends? (extrapolation effects?)

DWD in Face Recognition , (cont.) • Unregistered Version • Much blurrier • Since features don’t properly line up • Nonlinear Variation • But DWD still works • Can see M-F differ’ce?

DWD in Face Recognition , (cont.) • Interesting summary: • Jump between means (in DWD direction) • Clear separation of Maleness vs. Femaleness

DWD in Face Recognition , (cont.) • Fun Comparison: • Jump between means (in SVM direction) • Also distinguishes Maleness vs. Femaleness • But not as well as DWD

DWD in Face Recognition , (cont.) Analysis of difference: Project onto normals • SVM has “small gap” (feels noise artifacts?) • DWD “more informative” (feels real structure?)

DWD in Face Recognition, (cont.) • Current Work: • Focus on “drivers”: (regions of interest) • Relation to Discr’n? • Which is “best”? • Lessons for human perception?

Outcomes Data Breast Cancer Study (C. M. Perou): • Outcome of interest = death or survival • Connection with gene expression? Approach: • Treat death vs. survival during study as “classes” • Find “direction that best separates the classes”

Outcomes Data Find “direction that best separates the classes”

Support Vector Machines: Kernel Embedding Visualization