1 / 17

Fitting Models to Data

Fitting Models to Data. Linear and Quadratic Discriminant Analysis. Decision Trees. AID: Automatic Interaction Detector. Association Co- O ccurence. CHAID. CART: Classification and Regression Trees . CART family is oriented to statistics using the concept of impurity.

iden
Télécharger la présentation

Fitting Models to Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fitting Models to Data Linear and Quadratic Discriminant Analysis Decision Trees

  2. AID: Automatic Interaction Detector Association Co-Occurence

  3. CHAID

  4. CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurity Measures how well are the two classes separated – Ideally we would like toseparateall 0s and 1 http://freakonometrics.hypotheses.org/1279

  5. Fitting Models to Data

  6. OverFitting

  7. Bagging • Builds multiple decision trees by repeatedly resampling training data with replacement • Fit a Model to each Sample • Voting across the trees for a consensus prediction.

  8. Boosting • Learns slowly • Given the current model, we fit a decision tree to the residuals (misclassifications) from the model. • We then add this new decision tree into the fitted function in order to update the residuals. • Each of these trees can be rather small, with just a few terminal nodes, determined by the parameter d in the algorithm. • By fitting small trees to the residuals, we slowly improve fit in areas where it does not perform well

  9. Random Forests

  10. http://www.stat.berkeley.edu/~breiman/RandomForests/

  11. Gradient Boosting

  12. Many Algorithms Decision Trees Random Forests rpart (CART) tree (CART) ctree (conditional inference tree) CHAID (chi-squared automatic interaction detection) evtree (evolutionary algorithm) mvpart (multivariate CART) knnTree (nearest-neighbor-based trees) RWeka (J4.8, M50, LMT) LogicReg (Logic Regression) BayesTree TWIX (with extra splits) party (conditional inference trees, model-based trees) randomForest(CART-based random forests) randomSurvivalForest(for censored responses) party(conditional random forests) gbm(tree-based gradient boosting) mboost(model-based and tree-based gradient boosting)

More Related