1 / 33

340 likes | 487 Vues

09 Multiple Regression. Learning goals Multiple regression. Statistical model of multiple regression Multiple regression in R, including: Multicollinearity Influential points Interactions between variables Categorical variables (factors)

Télécharger la présentation
## 09 Multiple Regression

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Learning goals**Multiple regression • Statistical model of multiple regression • Multiple regression in R, including: • Multicollinearity • Influential points • Interactions between variables • Categorical variables (factors) • Model validation and selection, information criterion (basic theory and R) • Nonlinear regression - examples**Notation**X is an n ✕ (p+1) matrix, we assumpe its rank is p+1, n>p+1**Notation**What is the dimension of: • Xβ • y - Xβ • (y-Xβ)T(y-Xβ) How would grade, experience, salary look like in this notation? X is an n ✕ (p+1) matrix**Using the result from the previous slide:**H is called a hat matrix r - the vector of residuals, I - the identity matrix (optional) Proof of 9.12 based on: And matrix calculations. See: https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf For 9.13, first show, that HH = H (idempotency)**What are the risks of multiple predictors?**How to choose the best model?**Validation of the model**Why do we need adjusted R2?**Model 1**Model 5**Information criterion**Information Criterion balances the goodness of fit of the estimated models with its complexity, measured by the number of parameters. We assume that the distribution of the variables follows a known distribution with an unknown parameter θ. In maximum likelihood estimation, the larger the likelihood function L(θ^hat) or, equivalently, the smaller the negative log-likelihood function −log(θ^hat), the better the model is.**AIC and BIC**The Akaike information criterion Bayesian information criterion (BIC)**What is the expected salary for a female Prof in discipline**B, 10 years after PhD, 15 years of service? What is the expected salary for a female AsistProf, 0 years of service, 0 years after PhD, disciplineA? What do you think about yrs.since.phd and yrs.service?**Categorical variables in the linear regression**salary =b0 + b1*grade + b2*years_of_experience + b3*gender + b4*humanities/science/art + e Coding (dummy variables) salary = b0 + b1*grade + b2*years_of_experience + b3*is_men + b4*is_science + is_art + e**Multicollinearity**"multicollinearity" refers to predictors that are correlated with other predictors. Warning signs: • A regression coefficient is not significant even though the variable should be highly correlated with Y. • When you add or delete an X variable, the regression coefficients change dramatically. • You see a negative regression coefficient when your response should increase along with X. • You see a positive regression coefficient when the response should decrease as X increases. • Your X variables have high pairwise correlations.**Multicollinearity**variance-inflation factors vif(model)**Influential points**summary(influence.measures(lm1))**Interactions**How can we recognize existing interactions between variables?**Summary**Use multiple regression in R Interpret the output (also for categorical variables) Choose the important predictors Check for multicollinearity, influence measures of points, interactions. Discuss if the linear model is the appropriate choice for modelling given data set.**Admin**• Evaluation: Exercises Intro. to Statistics Gr.2 (Nr. 17217) – Link zur Umfrage: https://qmsl.uzh.ch/de/79VXV Introduction to Statistics (Nr. 17216) – Link zur Umfrage: https://qmsl.uzh.ch/de/FX4UV • Part A test exam • No lecture on the 23rd April, the lecture with Roman Flury on the 30th of April • No office hours on the 30th of April.

More Related