1 / 125

Chapter 11

Chapter 11. Multiple Linear Regression Introduction Theory SAS Summary By: Airelle, Bochao , Chelsea, Menglin , Reezan, Tim, Wu, Xinyi, Yuming. Introduction. Regression Analysis in the Making.

cachez
Télécharger la présentation

Chapter 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11 Multiple Linear Regression Introduction Theory SAS Summary By: Airelle, Bochao, Chelsea, Menglin, Reezan, Tim, Wu, Xinyi, Yuming

  2. Introduction

  3. Regression Analysis in the Making •  The earliest form of regression analysis was the method of least squares published by Legendre in 1805 in the paper “Nouvellesméthodes pour la détermination des orbites des comètes.”1 •  Legendre used least squares to study the orbits of comets around the Sun • “Sur la Méthode des moindres quarrées”1 was an appendix to the paper (“On the method of least squares”) Adrien-Marie Legendre2 1752-1833 1Firmin Didot, Paris, 1805. “Nouvellesméthodes pour la détermination des orbites des comètes.” “Sur la Méthode des moindresquarrés” appears as an appendix 2Picture from <http://www.superstock.com/stock-photos-images/1899-40028>

  4. Regression Analysis in the Making •  Gauss also developed the method of least squares for the purpose of astronomical observations •  In 1809 he published the paper: Theoriacombinationisobservationumerroribusminimus obnoxiae1 (Theory of the combination of observations least subject to errors). Johann Carl Friedrich Gauss 1777-1855 Shown here on the 10 Deutsche German banknote! 1C.F. Gauss. TheoriaMotusCorporumCoelestium in SectionibusConicisSolemAmbientum. (1809) 2Picture from <http://www.pictobrick.de/en/gallery_gauss.shtml>

  5. Why “regression”? Coined in the 19th century by Sir Francis Galton1 Sir Francis Galton2 1822-1911 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://hu.wikipedia.org/wiki/Szineszt%C3%A9zia>

  6. Why “regression”? The termed was used to describe how the height of tall ancestors will “regress” down to the average height of the current generation; also known as “regression towards the mean.” 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://en.wikipedia.org/wiki/File:Miles_Park_Romney_family.jpg>

  7. Fun Fact Before 1970, one run of linear regression could take up to 24 hours on an electromechanical desk calculator 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://www.technikum29.de/en/computer/electro-mechanical>

  8. Uses of linear regression • Making predictions: create the model using linear regression on an observed set of data and outcomes, then predict the next unknown outcome • Correlating data: Determine the relationship between two sets of data (one is not “causal” to the other)

  9. Theory

  10. Multiple Linear Regression • Review simple linear regression where we only have one predictor variable. • What if there exist more than one predictor? • Multiple linear regression model • Generalization of linear regression (considering more than one independent variable) i= 1,…,n i= 1, … , n

  11. We fit a model with the form: • i= 1, … , n • k≥2 predictor variables • : k+1 unknown parameters • : is a random error • Note: here “linear” because it is linear in the , not necessarily in the x’s. For example: : may be the salary of the ith person in the sample : years of experience : years of education Graph 1. Regression plane for the model with 2 predictor variables (source of Graph 1: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis)

  12. Here, we assume that the random errors are independent r.v’s. • Yi are independent r.v.’s with i=1,2,…n

  13. Fitting the Multiple Regression Model • Least Squares (LS) Fit estimates of the unknown parameters we minimize We set the first partial derivatives of Q with respect to equal to zero j=0,1,…,k

  14. Simplification leads to the following normal equations: j= 1,2,…k The resulting solutions are the least square ( LS ) estimates of And are denoted by respectively.

  15. Goodness of Fit of the Model • We use the residuals defined by i= 1,2,…, n • Where are the fitted values: i=1,2,…,n • As an overall measure of the goodness of fit, we can use the error sum of squares (which is the minimum value of Q.) We compare this SSE to the total sum of square Define the regression sum of squares given by SSR=SST-SSE The ratio of SSR to SST is called the coefficient of multiple determination: ranges between 0 and 1, values closer to 1 representing better fits. The fact is adding more predictor variables to a model generally increases The positive square root of is the multiple correlation coefficient

  16. Multiple Regression Model in Matrix Notation • Let be the n*1 factors of the r.v.’s, there observes values , and random Errors respectively. n*(k+1) matrix of the values of the Predictor variables.

  17. Let be the (k+1)*1 vectors of unknown parameters and their LS estimates respectively. Then, the model can be written as: The simultaneous linear equation of the normal equations can be written as: If the inverse of matrix X’X exists, then the solution is given by:

  18. Generalized linear model (GLM) The generalized linear model is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. source :http://en.wikipedia.org/wiki/Generalized_linear_model

  19. Statistical inference for Multiple regression First, we assume that Then, in order to determine which predictor variables have statistically significant effects on the response variable, we need to test the hypotheses: If we reject Then, is a significant predictor of y.

  20. It can be shown that each is normally distributed with mean and variance , where is the jth diagonal entry (j=0,1,..,k) of the matrix But how can we get the mean and variance?

  21. Mean: Here, Is unbiased, which is similar to the and in simple linear regression. Then we can get the following:

  22. Variance: From the assumption We can get, Let is the jth diagonal entry (j=0,1,..,k) of the matrix , We can get

  23. Derive the PQ for the inference : The unbiased estimator of the unknown error variance Is given by Here, MSE is error mean square and (n-(k+1)) is the degree freedom.

  24. Let And and are statistically independent. Statistically independent: the occurrence of one event does not affect the outcome of the other event. Recall the definition of t-distribution, we can obtain the pivotal quantity.

  25. Confidence interval: A 100(1-α)% confidence interval on is given by: So, the confidence interval for is: Where

  26. Derivation of the hypothesis test for at Test statistic: Reject if

  27. Another Hypothesis test to determine if the model is useful H0 : the null hypothesis which means that none of the predictors xj is related to y. Ha : indicates that at least one of them is related. The test statistic is Where and

  28. By using the formula Coefficient of multiple determination We have We can see that F is an increasing function of , and in this form is used to test the statistical significance of , which is equivalent to testing H0. Reject H0 if

  29. Extra Sum of Squares Method for Testing Subsets of Parameters Consider the full model: And the partial model: To test whether the full model is significantly better than the partial model, we have

  30. Since SST is fixed regardless of the particular model, We have Numerator m: # of coefficients set to zero. Denominator n-(k+1): the error df for the full model So, The extra sum of squares in the numerator represents the part of the variation in y that is accounted for by regression on the m predictors. Divided by m to get an average contribution per term. The test statistic is Rejects H0 if

  31. ANOVA table Links between ANOVA and extra sum of squares method: Let k=1 and m=k, we have

  32. Prediction of Future Observation Having fitted a multiple regression model, suppose that we want to predict the future value of y for a specified vector of predictor variables (Notice that we have included as the first component of the vector to correspond to the constant term in the model.)

  33. Prediction of Future Observation One way is to estimate by a confidence interval (CI). We already have

  34. Prediction of Future Observation And

  35. Prediction of Future Observation Replacing b by its estimate which has n-(k+1) df. The pivotal quantity is A level C.I for is given by

  36. Prediction of Future Observation Another way is to predict by a prediction interval (PI). We know The error prediction , is the difference between two independent variables with mean And variance

  37. Prediction of Future Observation Replacing by its estimate which has n-(k+1) df. The pivotal quantity is A level C.I for is given by

  38. Residual Analysis Recall that Where H is called the hat matrix

  39. Residual Analysis Standardized residuals are given by Here ith is the ith diagonal element of the Hat Matrix H Large | | values indicate outlier observations.

  40. Residual Analysis Moreover, we conclude the ith observation is influential if

  41. Data Transformation Transformations of the variables (both y and the x’s) are often necessary to satisfy the assumptions of linearity, normality and constant variance. Many seemingly nonlinear models can be written as the multiple linear regression model after making a suitable transformation Example:

  42. Data Transformation We can do the transformation by taking Ln on both sides Then we have Let We now have Which is a good model.

  43. Code and table and graphs

  44. Voting Example

  45. Voting Example • Setup: Data on individual state voting percentages for winners of the last twelve (15) U.S. presidential elections. y = New York voting percentage (‘ny’) x1 = California voting percentage (‘ca’) x2 = South Carolina voting percentage (‘sc’) x3 = Wisconsin voting percentage (‘wi’) • Goal: See if there’s any positive correlation between NY and California’s (two traditionally Democratic states) voting patterns, or a negative correlation between NY and South Carolina’s (one Democratic, one Republican state). • Note: Wisconsin was included as a variable although their traditional stance is (seemingly) more ambiguous.

  46. Source: <http://www.presidency.ucsb.edu/elections.php>

More Related