1 / 30

Chapter 22: Building Multiple Regression Models

Chapter 22: Building Multiple Regression Models. Generalization of univariate linear regression models. One unit of data with a value of dependent variable and p independent variables. Multiple Regression Model. Y i is value of dependent variable for i-th unit.

jrouse
Télécharger la présentation

Chapter 22: Building Multiple Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 22: Building Multiple Regression Models • Generalization of univariate linear regression models. • One unit of data with a value of dependent variable and p independent variables.

  2. Multiple Regression Model • Yi is value of dependent variable for i-th unit. • The values xi1, xi2, …, xip are values of the independent variables. • Zi is an unobservable error:

  3. Objectives • Estimate the regression coefficients β0, β1, …, βp. • Estimate σ (crucial for tests). • Test whether the regression coefficients β1, …, βp are all simultaneously zero (note that the intercept was left out). • Test whether some of the regression coefficients βq, …, βp are zero.

  4. Assumptions for Multiple Regression • Regression function is linear. • Error terms are independent. • Constant error variance. • Distribution of errors is normal.

  5. Context of your second project • Artificial data set, available on web site. • Each set is individual. • If you analyze the wrong data set, no credit! • Three dependent variables. • Three separate sections of your report! • Six independent variables. • 500 data points with replicated observations.

  6. Check Scatterplots • Use scatterplot matrix to get a brief summary look. • Graphs, scatterplot, matrix. • If Y vs xi is flat and patternless, then your interpretation is that the regression coefficient of xi is xero. • Two of the dependent variables are random samples.

  7. Strategy 1 • Enter all six independent variables (columns three through eight). • Statistics, regression, linear. • Examine R2 (easier to use sig of F statistic). • If R2 large (sig small), then that variable is not a random sample.

  8. Analysis of variance table • Three rows: regression, residual, and total. • Five columns • degrees of freedom • sum of squares • mean square • F • sig

  9. Table of regression coefficients • Contains the OLS estimates. • The line (constant) refers to β0, the intercept. • There is a line for each variable in the model that refers to βq, the partial regression coefficient (slope) of the q-th independent variable.

  10. Table of regression coefficients • Five columns of numbers • Two are labeled “unstandardized coefficients” • B column contains the OLS estimates. • Std. Error contains the estimated standard deviation.

  11. Table of regression coefficients • One is the standardized coefficient. • Scale free coefficient often used in social science studies for comparison across studies. • There is a column for t. • As usual, t=(B-0)/(se B). • There is a column for sig. • Interpret as a p-value.

  12. Interpretation • There appears to be an association between an independent variable and the dependent variable if the observed significance level is small for that coefficient. • Specify which variable has associations and the significant independent variables.

  13. Refinement of Model • Rerun regression using only those variables that appear to be significant. • Usually, the database of a study has many variables that have no association with the dependent variable. • Most clients prefer that these variables not be used. • There are some technical problems with this approach that are widely ignored.

  14. Partial correlation coefficient • Correlation between Y and X2, “controlling for” X1 (holding the variable “constant”) • given by the equation:

  15. Strategy 2: Stepwise Regression • Let the computer do the work. • In regression box, specify stepwise. • The computer will see whether additional variables can be added or added variables deleted. • There are three basic strategies: forward selection, backward selection, and stepwise.

  16. Stepwise regression strategy • Find independent variable with largest correlation with Y. • Check whether that is significant. • If no, stop. • If yes, check second variable.

  17. Stepwise regression strategy • Find independent variable with highest partial correlation, controlling for first. • If not significant, stop. • If significant, check for a third variable. • Find independent variable with highest partial controlling for first two.

  18. Stepwise regression strategy • Check whether its addition is significant. • If no, stop. • If yes, see whether the first or second step variable still adds. • Continuing interating until there are no variables that can be added or deleted.

  19. Using Stepwise Regression • Examine final model selected. • Note which variables are included. • Examine information for excluded variables. • Check whether there is any possibility that one of the variables left out might matter.

  20. Checking the Model • Residual plots. • Diagnostics. • Lack of Fit test. • More next class and after the exam.

  21. Univariate Linear Regression Problem • Model: Y=b0+b1X+e • Test: H0: β1=0. • Alternative: H1: β1>0. • The distribution of Y is normal under both null and alternative. • Under null, var(Y)=σ02. • Under alternative, β1>0, and var(Y)=σ12.

  22. Step 1: Choose the test statistic and specify its null distribution • Use conditions of the null to find:

  23. Bringing sample size into regression design • The sample size n is hidden in the regression results. That is, let:

  24. Step 2: Define the critical value • For the univariate linear regression test:

  25. Step 3: Define the Rejection Rule • Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.

  26. Step 4: Specify the Distribution of Test Statistic under Alternative • Use conditions of the null to find:

  27. Step 5: Define a Type II Error • For the univariate linear regression test:

  28. Step 6: Find β • For a univariate linear regression test:

  29. Step 7: Phrase requirement on β • That is, choose n so that (after algebraic clearing out):

  30. Univariate Linear Regression • Note that the σ0 factor is changed to σ0/σX. • There is a similar adjustment for the alternative standard deviation.

More Related