90 likes | 203 Vues
This paper by Matt VanLandeghem and Grant Sorensen discusses model averaging as a robust alternative for variable selection, addressing issues like too many or too few parameters that create bias and variance trade-offs. The SAS procedure, PROC GLMSELECT, is highlighted for its ability to estimate variable importance based on frequency rather than p-values. The approach benefits various fields including biology, econometrics, and finance, although it does come with pitfalls such as sensitivity to correlated predictors. References for further reading and potential alternatives are also included.
E N D
Model averaging as an alternative method of variable selection Matt VanLandeghem and Grant Sorensen
Problems with variable selection • Too many parameters: • Lots of variance in predicted values • Too few parameters: • Missing important parameters • Variance/bias tradeoff
SAS Demo • See SAS website • PROC GLMSELECT • Version 9.3 documentation (not 9.2) • http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_glmselect_sect037.htm
Benefits • Variable importance represented as a selection frequency • Instead of p-value from F test • Estimates based on several “good” models • Distributions of parameter estimates • All of these help us pick the most useful model
Applications • Any field where variable selection techniques are used • Biology (Burnham and Anderson 2002) • Atmospheric sciences (Sloughter et al. 2007) • Econometrics (LeSage and Parent 2007) • Finance (Pesaran et al. 2009) • Psychology (Wasserman 2000) • …and others
Pitfalls • SAS implementation • GLMSELECT • Only GLMs • Experimental • Sensitive to correlated predictors • e.g. Homework #4 • Extension of regression • Typical assumptions still apply • Not a “magic” solution GLM Correlation Assumptions
Alternatives • Other SAS options • AIC or BIC from SAS procedure of choice • Model weights based on AIC or BIC • Averaged “by hand”
References and Further Reading • Burnham, K.P. and D.R. Anderson. 2002. Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York. • LeSage, J.P and O. Parent. 2007. Bayesian model averaging for spatial economic models. Geographical Analysis 39:241-267. • Peseran, M.H., C. Schleicher, and P. Zaffaroni. 2009. Model averaging in risk management with an application to futures markets. Journal of Empirical Finance 16:280-305. • Sloughter, J.M., A.E. Raftery, T. Gneiting, and C. Fraley. 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 3209–3220 • Wasserman, L. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44:92-107. • Whintey, M. and L. Ngo. 2004. Bayesian model averaging using SAS software. SUGI 29 Proceedings, Paper 203-29. • Pitfall picture:http://www.retrogameoftheday.com/2009/10/retro-game-of-day-pitfall.html • SAS model averaging webpage: http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_glmselect_sect026.htm
SAS Code ods graphics on; procglmselect data = colstd seed=3 plots= all; model y = x1-x9 / selection=stepwise (choose=cv); modelAveragetables=(EffectSelectPct(all) ParmEst(all)) refit(minpct=50nsamples=100) ; run; ods graphics off;