Additive Models, Trees, and Related Methods

Additive Models, Trees, and Related Methods 2006. 02. 17. Partly based on Prof. Prem Goel’s Slides

9.1 Generalized Additive Models • Mean function: • fj: unspecified smooth (nonparametric) functions • Relate conditional mean of Y to an additive function of X’s via a link function g.

Standard Link Functions

Advanced Link Functions

Fitting Additive Models • Fit each fj using scatterplot smoother and estimate all p functions simultaneously • For example, the cubic smoothing spline as smoother • Criterion: penalized sum of squares (9.7) • An additve cubic spline model minimizes this • Each fj is cubic spline in the component Xj • Knots at each of the unique values xij

The backfitting algorithm • Can accommodate other fitting methods in same way, by specifying appropritate smoothing operator Sj. • For a large class of linear smoothers, backfitting is equivalent to a Gauss-Seidel algorithm

Additive Logistic Regression • For the logistic regression model and other generalized additive models, the appropriate criterion is a penalized log-likelihood. • To maximize it, the backfitting procedure is used in conjunction with a likelihood maximizer.

Local Scoring Algorithm for the Additive Logistic Regression

Partition the feature space into a set of rectangles and fit a simple model in each one. CART and C4.5 9.2 Tree-Based Methods

Regression Tree • Assume recursive binary partition • In each partition, Y is modeled with a different constant. • For each split, choose the variable and split-point which minimizes sum of squares. • Repeat with each subset, until reach a minimum node size

Regression Tree • How large should we grow the tree? • Cost-complexity pruning • Find tree which minimizes • Choosing  adaptively by weakest link pruning • Collapse the smallest per-node increase in RSS until we get the single-node tree. • Among these sequence of trees, there exists a tree that minimizes cost-complexity • Cross-validation

Classification Trees • Only change in the criteria to split nodes and pruning the tree.

Node Impurity Measures • Cross-entropy and Gini index are more sensitive to changes in the node probabilities than the misclassification rate. • Either cross-entropy and Gini index should be used when growing the tree. • When pruning, any of the three can be used.

Other Issues • Instability • Hierarchical process: error on the upper split is propagated down. • Bagging • Lack of smoothness in prediction surface. • Can degrade performance in regression. • MARS • ROC curves • By varying relative sizes of the losses L01 and L10 in loss matrix, increase/decrease the sensitivity/specificity

9.3 PRIM-Bump Hunting • Patient Rule Induction Method • Seeks boxes in which the response average is high. • Not binary split • Hard to interpret the collection of rules. • Individual rule is simpler. • Patient • Do not fragment the data quickly as binary partition. • Can help the top-down greedy algorithm find a better solution.

PRIM

Basic element – pair of piecewise linear basis function Form each reflected pairs for each input Xj with knots at each observed value of that input. Total 2Np basis functions 9.4 MARS: Multivariate Adaptive Regression Splines

Model Building

Forward Selection

General Basis Selection Rule

Backward Deletion

Effective # of Parameters

Other Issues • MARS for classification • Two classes: 0/1 code and regression • More than two classes: optimal scoring (Section 12.5) • MARS vs. CART • Piecewise linear basis vs. step functions • Multiplication vs. splitting • Not necessarily binary splitting.

9.5 Hierarchical Mixtures of Experts • Soft gating network with expert at terminal node.

Hierarchical Mixtures of Experts

Hierarchical Mixtures of Experts • Estimation of parameters • EM algorithm • E-step: compute expectations of gating probabilities • M-step: estimate the parameters in the expert networks by multiple logistic regression. • HME vs. CART • Similar to CART with linear combination splits. • Soft split: better to model gradual response transition • No method to find a good tree topology for HME

9.6 Missing Data • Whether the missing data mechanism distorted the observed data. • Missing at random(MAR) – missing data mechanism is independent of the observed data. • Missing completely at random(MCAR) – missing data mechanism is independent of data.

Missing Data • Assuming MCAR • Discard observations with any missing values. • Rely on the learning algorithm to deal with missing values in its training phase. • Impute all missing values before training.

9.7 Computational Considerations • Additive Model fitting: O(mpN+pNlogN), m is # of iterations. • Trees: O(pNlogN) for initial sorting and split computation) • MARS: O(NM2+pM2N), M is # of terms • HME: O(Np2) for the regression, Np2K2(EM algorith takes long to converge)

Additive Models, Trees, and Related Methods - Advanced Techniques in Statistical Modeling

Additive Models, Trees, and Related Methods - Advanced Techniques in Statistical Modeling

Presentation Transcript

Generalized Additive Models

Boosting and Additive Trees (2)

Generalized Additive Models

Boosting and Additive Trees (Part 1)

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

Lectures 17,18 – Boosting and Additive Trees

Lecture 8 Generalized Additive Models

Additive Groves of Regression Trees

Chapter 9 Additive Models ， Trees ， and Related Models

Additive Models ， Trees ， and Related Models

Comp 540 Chapter 9: Additive Models, Trees, and Related Methods

Modeling Additive Structure and Detecting Interactions with Additive Groves of Regression Trees

Methods and related tools

Regression trees and regression graphs: Efficient estimators for Generalized Additive Models

Generalized Additive Models

Additive Models ， Trees ， and Related Models

Additive Models ， Trees ， and Related Models

Lecture 8 Generalized Additive Models