Margin Trees: Enhancing High-Dimensional Classification Through Improved Interpretability

Margin Trees for High-dimensional Classification Tibshirani and Hastie

Errata (confirmed by Tibshirani) • Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).

Targeted Problem • Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients

T ( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer

Higher Interpretability • Multi-class problems  reduce to binary • 1vs1 voting  not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)

Training Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right  class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}

Tree Structure(1/2) • Top-down Construction • Greedy

Greedy (1/3) 1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!

Greedy (2/3) 1,2,3 2,3 • Repeat in child nodes.

Greedy (2/3) 1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin

Tree Structure(2/2) • Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.

Complete Tree

Complete Tree Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree

Efficient Greedy Tree Construction • Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0

Comparable testing performances (also 1vs1 voting) • Complete linkage tree more balance  more interpretable

T T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function

Feature Selection • Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM  force βi to zero

T T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0

Feature Selection Result

Discussion • Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).

Margin Trees: Enhancing High-Dimensional Classification Through Improved Interpretability

Margin Trees: Enhancing High-Dimensional Classification Through Improved Interpretability

Presentation Transcript

High Dimensional Chaos

Classification: Decision Trees

Classification and regression trees

Classification with Decision Trees

High Dimensional Chaos

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

High Dimensional Indexing

New Algorithms for Efficient High-Dimensional Nonparametric Classification

Classification Trees Testing

High-Dimensional Data

High Dimensional Chaos

Classification using Decision Trees

Multi-dimensional Search Trees

Decision trees for hierarchical multilabel classification

High Dimensional Chaos

Classification: Decision Trees

High Margin Mindset

Classification and Regression Trees

Booster in High Dimensional Data Classification

Multi-dimensional Search Trees

Classification: Decision Trees

High Dimensional Data