Regression Using Boosting

Vishakh (vv2131@columbia.edu) Advanced Machine Learning Fall 2006 Regression Using Boosting

Introduction • Classification with boosting • Well-studied • Theoretical bounds and guarantees • Empirically tested • Regression with boosting • Rarely used • Some bounds and guarantees • Very little empirical testing

Project Description • Study existing algorithms & formalisms • AdaBoost.R (Fruend & Schapire, 1997) • SquareLev.R (Duffy & Helmbold, 2002) • SquareLev.C (Duffy & Helmbold, 2002) • ExpLev (Duffy & Helmbold, 2002) • Verify effectiveness by testing on interesting dataset. • Football Manager 2006

A Few Notes • Want PAC-like guarantees • Can't directly transfer processes from classification • Simply re-weighting distribution over iterations doesn't work. • Can modify samples and still remain consistent with original function class. • Performing gradient descent on a potential function.

SquareLev.R • Squared error regression. • Uses regression algorithm for base learner. • Modifies labels, not distribution. • Potential function uses variance of residuals. • New label proportional to negative gradient of potential function. • Each iteration, mean squared error decreases by a multiplicative factor. • Can get arbitrarily small squared error as long as correlation between residuals and predictions > threshold.

SquareLev.C • Squared error regression • Use a base classifier • Modifies labels and distribution • Potential function uses residuals • New label sign of instance's residual

ExpLev • Attempts to get small residuals at each point. • Uses exponential potential. • AdaBoost pushes all instances to positive margin. • ExpLev pushes all instances to have small residuals • Uses base regressor ([-1,+1]) or classifier ({-1,+1}). • Two-sided potential uses exponents of residuals. • Base learner must perform well with relabeled instances.

Naive Approach • Directly translate AdaBoost to the regression setting. • Use thresholding of squared error to reweight. • Use to compare test veracity of other approaches

Dataset • Data from Football Manager 2006 • Very popular game • Statistically driven • Features are player attributes. • Labels are average performance ratings over a season. • Predict performance levels and use learned model to guide game strategy.

Work so far • Conducted survey • Studied methods and formal guarantees and bounds. • Implementation still underway.

Conclusions • Interesting approaches and analyses of boosting regression available. • Insufficient real-world verification. • Further work • Regressing noisy data • Formal results for more relaxed assumptions

Regression Using Boosting

Regression Using Boosting

Presentation Transcript

Boosting

Using SPSS for Simple Regression

Genetic Programming With Boosting for Ambiguities in Regression Problem

Regression using lm lmRegression.R

Boosting

boosting

Boosting

Boosting

Boosting

Boosting

Recognition using Boosting

Distance Calculation Correction Using Regression

Additive Logistic Regression: a Statistical View of Boosting

Boosting

Boosting

Linear Regression using R

Tobit Regression Models Using R

Logistic Regression using STATA

Regression using serial data

Additive Logistic Regression: a Statistical View of Boosting

Boosting

Regression Analysis Using Least Squares