1 / 15

Boosting and Bagging For Fun and Profit

Boosting and Bagging For Fun and Profit. Hal Elkins David Lucus Keith Walker. Ensemble Methods. Improve predictive performance of a given statistical model fitting technique. Run a base procedure many times while changing input data

shana
Télécharger la présentation

Boosting and Bagging For Fun and Profit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Boosting and BaggingFor Fun and Profit Hal Elkins David Lucus Keith Walker

  2. Ensemble Methods • Improve predictive performance of a given statistical model fitting technique. • Run a base procedure many times while changing input data • Estimates are linear or non-linear combinations of iteration estimates • Originally used for machine learning and data and text mining • Attracting attention due to relative simplicity and popularity of bootstrapping • Are ensemble methods useful to academic researchers?

  3. Bagging • Bootstrap aggregating for improving unstable estimation schemes - Breiman (1996) • Variance reduction for base procedure – Bühlmann and Yu (2002) • Bagging requires user specified input models • Step 1) construct bootstrap sample with replacement. • Step 2) compute estimator • Step 3) repeat • Base procedure bias is increased

  4. Boosting • Boosting proposed by Schapire (1990) and Freund (1995) • Nonparametric optimization - useful if we have no idea for a model • Bias reduction • Step 1) initialize – apply base procedure • Step 2) compare residuals • Step 3) repeat • Variance is increased

  5. Enterprise Miner: Bagging & Boosting

  6. Enterprise Miner: Bagging • Control: • Ensemble (under model menu) • Inputs • The outputs of other models • Regression • Decision Tree • Neural Networks • Settings • Limited and VERY BLACK BOX

  7. Enterprise Miner: Bagging • How to view output • Connect Ensemble output to Regression node • Use Comparison node to: • Compare Bagged model with input models • Note: • Ensemble will only out perform the input models IF, there is large disagreement in the input models. (AAEM_61 Manual) Year 1 Analysis Year 2 Analysis

  8. Enterprise Miner: Boosting • Control • Gradient Boosting (under model menu) • Input • Data Partition or Dataset • Settings (MANY) • Assessment Measures • Tree Size Settings • Iterations • Etc.

  9. Enterprise Miner: Boosting • Outputs • List variable importance • List # of decision rules containing each variable • Hook to regression node for more information • Compare with other models • Use Model Comparison node • Example • Gradient Boost Regression AIC: -5868.21 • Base Regression AIC: -2866.43 • Base Neural Network AIC: -2981.86

  10. Enterprise Miner: Boosting • A Boosting Story (Another data set) • Prediction of graduation from TTU • Data 2004-2007 (SAT, ACT, HSRank%, Parent Education, Income Level) • Texas Census Level (Matched on Student’s High School county code (20+ variables) • After boosting 1 variable had prediction power of graduation from TTU

  11. Previous Use • Manescu, C. & Starica, C. 2009. Do corporate social responsibility scores explain and predict firm profitability? A case study on the publishers of the Dow Jones Sustainability Indexes. Working Paper, Gothenburg University. • Bagging and Boosting to determine if CSR measures affect ROA

  12. Model Comparisons – Data Courtesy Dr. Romi • Dependent Variable (Change in CSR Performance)i,t+n • OLS = α + β1[CSO]i,t + β2COMMITTEEi,t + β3∆SIZEi,t+1 + β4∆ROAi,t+1 + β5ΔFINi,t+1 + β6ΔLEVi,t+1 + β7GLOBALi,t + β8CEOCHAIRi,t + β9HIERi,t + β10ESIi,t + β11LITIGATIONi,t + β12EXPERTi,t + ε • Boosting = α + β1[CSO]i,5 + β2COMMITTEEi,t + β3GLOBALi,t + β4CEOCHAIRi,t + β5HIERi,t + ε • Bagging All = α + β1[CSO]i,4 + β2[CSO]i,5 + β3COMMITTEEi,t + β4GLOBALi,t + β5EXPERTi,t + ε

  13. Is Either Useful to Us? • What we thought • Both useful in model selection and refinement • What we concluded • Bagging for settling different model possibilities • Bagging helps determine model disagreement on “black box” models • Boosting for grounded theory • Boosting for starting model point

  14. References • Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996a) • Bühlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30, 927–961 (2002) • Freund, Y.: Boosting a weak learning algorithm by majority. Inform. Comput. 121, 256–285 (1995) • Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5, 197–227 (1990) • Course Notes • George, Jim et al: Applied Analytics Using SAS Enterprise Miner 6.1, Course Notes, 2009 • Working Paper • Manescu, C. & Starica, C. 2009. Do corporate social responsibility scores explain and predict firm profitability? A case study on the publishers of the Dow Jones Sustainability Indexes. Working Paper, Gothenburg University.

  15. THANK YOU! • QUESTIONS?

More Related