Stat 324 – Day 25

Stat 324 – Day 25 Variable Selection Techniques cont

Variable selection • Want to find the combination of variables that explains the most variability in the simplest possible model • Look for variables that explain a higher percentage of the remaining unexplained variation (partial correlation coefficients) • Can use automated procedures … with caution

Variable Selection • Forward selection: Bring in most highly correlated variable and then the variable with the highest “partial correlation” and so on • Backward elimination: Take out the least significant variable, refit model, repeat. • Stepwise regression: Forward selection but at each step considering removing any variables that are now insignificant • Assumes variables are appropriate

Stepwise Regression (Mixed)

Best Subsets

Last Time

Last Time: AIC vs. BIC AIC BIC tyer: 322.4 te: 322.7 tye: 324.2 ter: 324.6 • tyer: 311.1 • tiyer: 311.9 • typer: 312.7 • tiyper: 313.9

Practice problem • I chose to use the model with 4 variables, where SAT score is predicted by YEARS, EXPEND, RANK, and ln(takers) because this model had the lowest AIC. The states that are doing the best are the ones with the most positive residuals, because positive residuals means they performed better than the model predicted them to. New Hampshire is the state that is doing the best. New Hampshire had the highest residual of about 59, which means it is performing 59 pts better than the model predicted it would with the variables YEARS, EXPEND, RANK, and ln(takers).

Other notes • Insignificant terms • Doesn’t really hurt to leave them in the model as long as you clarify that they are not significant • vs. Parsimony, R2adj • Could keep in by request of subject matter expert or for sake of completeness (e.g., lower order terms of polynomial, set of indicator variables, indicators in presence of interactions)

Want to adjust for the model size

PRESS and R2predicted • Want the model that does the best job of predicting future observations. • But what if you don’t have future observations? • Internal validation: PRESS statistic – want the smallest value

Recap • Adding more variables to model

Stat 324 – Day 25

Stat 324 – Day 25

Presentation Transcript

Vision for the Blind . Stat 19 SEM 2. 263057202. Talk 1.

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Intermediate Applied Statistics STAT 460

STAT 3130

Line of Best Fit

Statistical Office of the Republic of Serbia

CS 311 – Lecture 12 Outline

Stat 470-8

Statistical Office of the Republic of Serbia

Statistics Major at Penn State