2004 年 11 月 24 日（水）～ 26 日（金）

2004 Open Lecture at ISM Recent topics in machine learning: Boosting 2004年11月24日（水）～ 26日（金）公開講座　統計数理要論「機械学習の最近の話題」ブースト学習江口　真透（統計数理研究所, 総合研究大学院統計科学）

講座内容 • ブースト学習: • 統計的パタン認識の手法であるアダブーストを • 概説し、その長所と欠点について考察します． • 遺伝子発現、リモートセンシング・データ • などの適用例の紹介をします。

Boost Leaning (I) • 10:00-12:30 November 25 Thu Boost learning algorithm AdaBoost AsymAdaBoost AsymLearning EtaBoost Robust learning GroupBoost Group learning

Boost Leaning (II) 13:30-16:00 November 26 Fri BridgeBoost Meta learning LocalBoost Local learning Statistical discussion Probablistic framework Bayes Rule, Fisher’s LDF, Logistic regression Optimal classifier by AdaBoost

謝辞 • この講座で紹介された内容の多くは，以下の • 共同研究者の方々との成果を含む．ここに感謝する． • 村田昇氏（早稲田大学理工学） • 西井龍映氏（九州大学数理学） • 金森敬文氏（東京工大，情報数理） • 竹之内高志氏（統計数理研究所） • 川喜田正則君（総研大，統計科学） • John B. Copas (Dept Stats, Univ of Warwick)

The strength of the weak learnability. Schapire, R. (1990) Strong learnability If, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. Weak learnability The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.

Web-page on Boost Boosting Research Site: http: //www.boosting.org/ Robert Schapire’s home page: http://www.cs.princeton.edu/~schapire/ Yoav Freund's home page : http://www1.cs.columbia.edu/~freund/ John Lafferty http://www-2.cs.cmu.edu/~lafferty/

Statistical pattern recognition Recognition for Character, Image, Speaker, Signal, Face, Language,… Prediction for Weather, earthquake, disaster, finance, interest rates, company bankruptcy, credit, default, infection, disease, adverse effect Classification for Species, parentage, genomic type, gene expression, protein expression, system failure, machine trouble

Multi-class classification Class label Feature vector Discriminant function Classification rule

Binary classification label 0-normalization Classification rule Learn a training dataset Make a classification

Statistical learning theory • Boost learning Boost by filter (Schapire, 1990) Bagging, Arching （bootstrap） (Breiman, Friedman, Hasite) AdaBoost (Schapire, Freund, Batrlett, Lee) Support vector Maximize margin Kernel space (Vapnik, Sholkopf)

Class of weak machines Stamp class Linear class ANN class SVM class kNN class Point:colorful character rather than universal character

AdaBoost

Learning algorithm Final machine

Simulation (complete separation) Feature space [-1,1]×[-1,1] Decision boundary

Set of weak machines Linear classification machines Random generation

Learning process (I) Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10 Iter = 47, train err = 0.08 Iter = 23, train err = 0.10 Iter = 31, train err = 0.095

Learning process (II) Iter = 55, train err = 0.061 Iter = 99, train err = 0.032 Iter = 155, train err = 0.016

Final stage Contour ofF(x) Sign(F(x))

0.2 0.15 0.1 0.05 50 100 150 200 250 Learning curve Training error Iter = 1,…..,277

Characteristics Update Weighted error rates （ least favorable ）

Exponential loss Exponential loss ： Update by

Sequential minimization where Equality holds iff

AdaBoost　＝　minimum exp-loss

Simulation (complete random)

Overlearning of AdaBoost Iter = 51, train err = 0.21 Iter = 151, train err = 0.06 Iter =301, train err = 0.0

Drawbacks of AdaBoost 1. Unbalancing learning AsymAdaBoost Balancing the false n/ps’ 2. Over-learning even for noisy dataset Robustfy mislabelled examples EtaBoost GroupBoost Relax the p >> n problems LocalBoost Extract spatial information BridgeBoost Combine different datasets

AsymBoost The small modification of AdaBoost into 2 (b)’ The selection of k The default choice is

Weighted errors by k

Result of AsymBoost

Eta-loss function regularized

EtaBoost (b)

A toy example

AdaBoost vs Eta-Boost

Simulation (complete random) Overlearning of AdaBoost Iter = 51, train err = 0.21 Iter =301, train err = 0.0

EtaBoost Iter = 51, train err = 0.25 Iter = 51, train err = 0.15 Iter =351, train err = 0.18

Mis-labeled examples Mis-labeled

Comparison AdaBoost EtaBoost

GroupBoost Relax over-learning of AdaBoost by group learning Idea: In AdaBoost 2 (a) The best machine is singly selected Other better machines are cast off. Is there any wise way of grouping G best macines?

Grouping machines

GroupBoost

Grouping jumps for the next

Learning archtecture Grouping G machines

AdaBoost and GroupBoost Update the weights

From microarray Contest program from bioinformatics (BIP2003) http://contest.genome.ad.jp/ Microarray data Number of genes p = 1000～100000 Size of individuals n = 10～100

Output http://genome-www.stanford.edu/cellcycle/

2004 年 11 月 24 日（水）～ 26 日（金）