200 likes | 350 Vues
This article discusses improvements in the Margin Trees framework for high-dimensional, multi-class classification problems, particularly in the context of gene expression and cancer type classification. Key corrections and clarifications from Tibshirani and Hastie’s original work are noted, emphasizing the need for clear definitions of single and complete linkage. The proposed solution integrates hierarchical tree structures with max-margin classifiers, offering a method to build interpretable models that facilitate the identification of significant features, aiding domain experts in understanding gene-cancer relationships.
E N D
Margin Trees for High-dimensional Classification Tibshirani and Hastie
Errata (confirmed by Tibshirani) • Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).
Targeted Problem • Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients
T ( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer
Higher Interpretability • Multi-class problems reduce to binary • 1vs1 voting not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)
Training Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}
Tree Structure(1/2) • Top-down Construction • Greedy
Greedy (1/3) 1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!
Greedy (2/3) 1,2,3 2,3 • Repeat in child nodes.
Greedy (2/3) 1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin
Tree Structure(2/2) • Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.
Complete Tree Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree
Efficient Greedy Tree Construction • Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0
Comparable testing performances (also 1vs1 voting) • Complete linkage tree more balance more interpretable
T T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function
Feature Selection • Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM force βi to zero
T T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0
Discussion • Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).