A New Boosting Algorithm Using Input-Dependent Regularizer

A New Boosting Algorithm Using Input-Dependent Regularizer Rong Jin1, Yan Liu2, Luo Si2, Jamie Carbonell2, Alex G. Hauptmann2 1. Michigan State University, 2. Carnegie Mellon University

Outline • Introduction of AdaBoost algorithm • Problems with AdaBoost • New boosting algorithm: input-dependent regularizer • Experiment • Conclusion and future work

AdaBoost Algorithm (I) • Boost a weak classifier into a strong classifier by linearly combine an ensemble of weak classifiers • AdaBoost • Given:A weak classifier h(x) with a large classification error E(x,y)~P(x,y)(h(x)y) • Output: HT(x)= 1h1(x) + 2h2(x) +…+ThT(x) with a low classification error E(x,y)~P(x,y)(H(x)y)

Sampling distribution Only focus on the examples that are misclassified or weakly classified by previous weak classifiers Combining Weak Classifiers Combination constants are computed in order to minimize the training error Choice of t: AdaBoosting Algorithm (II)

Problems 1: Overfitting • AdaBoost seldom overfits • Not only minimizes the training error but also tends to maximize the classification margin (Ondar & Muller, 1998; Friedman et al., 1998) • AdaBoost does overfit when the data are noisy (Dietterich, 2000; Ratsch & Muller, 2000; Grove & Schuurmans, 1998) • Sampling distribution Dt(x) can have overly emphasis on noisy patterns • Due to the “hard margin” criteria (Ratsch et al., 2000)

Problems 1: Overfitting • Introduce regularization • Not only just minimize the training error • Typical solutions • Smooth the combination constant (Schapire & Singer, 1998) • Epsilon boosting: equal to L1 regularization (Friedman & Tibshirani, 1998) • Boosting with soft margin (Ratsch et. al, 2000) • BrownBoost: a non monotonic cost function (Freund, 2001)

Problem 2: Why Linear Combination? • Each weak classifier ht(x) is trained on a different sampling distribution Dt(x) • only good for particular types of input patterns • {ht(x)} is a diverse ensemble • Linear combination is not able to take full strength of the diverse ensemble {ht(x)} • Solution: combination constants should be input dependent

Input Dependent Regularizer • Solve the two problems • overfitting and constant combination • Input dependent regularizer • Main idea: different combination form

Role of • Regularizer • Prevent |HT(x)| from growing too fast • Theorem: if all t are bounded max, |HT(x)| a ln(bT+c) • For the of linear combination in AdaBoost, |HT(x)|~O(T) • Router • Input dependent combination constant • The prediction of ht(x) is used only when Ht-1(x) is small • Consistent with the training procedure • ht(x) is trained on the examples that Ht-1(x) is uncertain

WeightBoost Algorithm (1) • Similar to AdaBoost: minimize the exponential cost function • Training setup • hi(x): x{1,-1}; a basis (weak) classifier • HT(x): a linear combination of basic classifiers • Goal: minimize training error

Emphasize misclassified data patterns Avoid overemphasis on noisy data patterns WeightBoost Algorithm (2) As Simple As AdaBoost ! Choice of t:

Empirical studies • Datasets: eight different UCI datasets with only binary classes • Methods to compare with • AdaBoost algorithm • WeightDecay Boost algorithm: close to L2 regularization • Epsilon Boosting: related to L1 regularization

Experiment 1: Effectiveness • Compare to AdaBoost • The WeightBoost performs better than AdaBoost algorithm. • In many cases, the WeightBoost performs substantially better than AdaBoost algorithm

Experiment 2: Beyond Regularization • Compare to other regularized boosting • WeightDecay Boost and Epsilon Boost • The WeightBoost performs slightly better than other regularized boosting algorithms • In several cases, the WeightBoost performs better than the other two regularized boosting algorithms

Results for 10% Noise Experiment 3: Resistance to Noise • Randomly select 10%, 20%, and 30% of training data and set the labels of training data to be random value • The WeightBoost is more resistant to training noise than AdaBoost algorithm • In several cases, when AdaBoost overfits the training noises, WeightBoost is still able to perform well

Experiments with Text Categorization • Reuter-21578 corpus with 10 most popular categories: WeightBoost improves 7 out of 10 categories

Conclusion and Future Work • Introduce an input dependent regularizer into the combination form • Prevent |H(x)| from increasing too fast  resistant to training noise • ‘Route’ a testing data pattern to it’s appropriate classifier  improve the classification accuracy even further than standard regularization • Future research issues • How to determine the constant ? • Other input dependent regularizer?

A New Boosting Algorithm Using Input-Dependent Regularizer

A New Boosting Algorithm Using Input-Dependent Regularizer

Presentation Transcript

KLT, a new algorithm for SETI

A New Voronoi-based Reconstruction Algorithm

Parsing using CYK Algorithm

A New Algorithm for 3D Isovist

Event Detection using a Clustering Algorithm

A new algorithm for bidirectional deconvolution

A Boosting Algorithm for Classification of Semi-Structured Text

2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set

Seven O’Clock: A New Distributed GVT Algorithm using Network Atomic Operations

A New Algorithm for Hiding Data Using Image Based Steganography

Getting Player Input Using a Gamepad

A New Algorithm to Extract the Time Dependent Transmission Rate from Infection Data

Input-Dependent and Asymptotic Approximation

Recognition using Boosting

A New Gravitational Clustering Algorithm

Security Under Key-Dependent Input

A New Linear-threshold Algorithm

Write a simple problem using input/tutorialoutlet

Boosting Algorithm for Clustering

2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set

A New Algorithm for Hiding Data Using Image Based Steganography

Regression Using Boosting