340 likes | 492 Vues
09 Multiple Regression. Learning goals Multiple regression. Statistical model of multiple regression Multiple regression in R, including: Multicollinearity Influential points Interactions between variables Categorical variables (factors)
E N D
Learning goals Multiple regression • Statistical model of multiple regression • Multiple regression in R, including: • Multicollinearity • Influential points • Interactions between variables • Categorical variables (factors) • Model validation and selection, information criterion (basic theory and R) • Nonlinear regression - examples
Notation X is an n ✕ (p+1) matrix, we assumpe its rank is p+1, n>p+1
Notation What is the dimension of: • Xβ • y - Xβ • (y-Xβ)T(y-Xβ) How would grade, experience, salary look like in this notation? X is an n ✕ (p+1) matrix
Using the result from the previous slide: H is called a hat matrix r - the vector of residuals, I - the identity matrix (optional) Proof of 9.12 based on: And matrix calculations. See: https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf For 9.13, first show, that HH = H (idempotency)
What are the risks of multiple predictors? How to choose the best model?
Validation of the model Why do we need adjusted R2?
Model 1 Model 5
Information criterion Information Criterion balances the goodness of fit of the estimated models with its complexity, measured by the number of parameters. We assume that the distribution of the variables follows a known distribution with an unknown parameter θ. In maximum likelihood estimation, the larger the likelihood function L(θ^hat) or, equivalently, the smaller the negative log-likelihood function −log(θ^hat), the better the model is.
AIC and BIC The Akaike information criterion Bayesian information criterion (BIC)
What is the expected salary for a female Prof in discipline B, 10 years after PhD, 15 years of service? What is the expected salary for a female AsistProf, 0 years of service, 0 years after PhD, disciplineA? What do you think about yrs.since.phd and yrs.service?
Categorical variables in the linear regression salary =b0 + b1*grade + b2*years_of_experience + b3*gender + b4*humanities/science/art + e Coding (dummy variables) salary = b0 + b1*grade + b2*years_of_experience + b3*is_men + b4*is_science + is_art + e
Multicollinearity "multicollinearity" refers to predictors that are correlated with other predictors. Warning signs: • A regression coefficient is not significant even though the variable should be highly correlated with Y. • When you add or delete an X variable, the regression coefficients change dramatically. • You see a negative regression coefficient when your response should increase along with X. • You see a positive regression coefficient when the response should decrease as X increases. • Your X variables have high pairwise correlations.
Multicollinearity variance-inflation factors vif(model)
Influential points summary(influence.measures(lm1))
Interactions How can we recognize existing interactions between variables?
Summary Use multiple regression in R Interpret the output (also for categorical variables) Choose the important predictors Check for multicollinearity, influence measures of points, interactions. Discuss if the linear model is the appropriate choice for modelling given data set.
Admin • Evaluation: Exercises Intro. to Statistics Gr.2 (Nr. 17217) – Link zur Umfrage: https://qmsl.uzh.ch/de/79VXV Introduction to Statistics (Nr. 17216) – Link zur Umfrage: https://qmsl.uzh.ch/de/FX4UV • Part A test exam • No lecture on the 23rd April, the lecture with Roman Flury on the 30th of April • No office hours on the 30th of April.