Chapter 11

Chapter 11 Multiple Linear Regression Introduction Theory SAS Summary By: Airelle, Bochao, Chelsea, Menglin, Reezan, Tim, Wu, Xinyi, Yuming

Introduction

Regression Analysis in the Making •  The earliest form of regression analysis was the method of least squares published by Legendre in 1805 in the paper “Nouvellesméthodes pour la détermination des orbites des comètes.”1 •  Legendre used least squares to study the orbits of comets around the Sun • “Sur la Méthode des moindres quarrées”1 was an appendix to the paper (“On the method of least squares”) Adrien-Marie Legendre2 1752-1833 1Firmin Didot, Paris, 1805. “Nouvellesméthodes pour la détermination des orbites des comètes.” “Sur la Méthode des moindresquarrés” appears as an appendix 2Picture from <http://www.superstock.com/stock-photos-images/1899-40028>

Regression Analysis in the Making •  Gauss also developed the method of least squares for the purpose of astronomical observations •  In 1809 he published the paper: Theoriacombinationisobservationumerroribusminimus obnoxiae1 (Theory of the combination of observations least subject to errors). Johann Carl Friedrich Gauss 1777-1855 Shown here on the 10 Deutsche German banknote! 1C.F. Gauss. TheoriaMotusCorporumCoelestium in SectionibusConicisSolemAmbientum. (1809) 2Picture from <http://www.pictobrick.de/en/gallery_gauss.shtml>

Why “regression”? Coined in the 19th century by Sir Francis Galton1 Sir Francis Galton2 1822-1911 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://hu.wikipedia.org/wiki/Szineszt%C3%A9zia>

Why “regression”? The termed was used to describe how the height of tall ancestors will “regress” down to the average height of the current generation; also known as “regression towards the mean.” 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://en.wikipedia.org/wiki/File:Miles_Park_Romney_family.jpg>

Fun Fact Before 1970, one run of linear regression could take up to 24 hours on an electromechanical desk calculator 1”Regression analysis." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 20 Nov. 2013. 2Picture from <http://www.technikum29.de/en/computer/electro-mechanical>

Uses of linear regression • Making predictions: create the model using linear regression on an observed set of data and outcomes, then predict the next unknown outcome • Correlating data: Determine the relationship between two sets of data (one is not “causal” to the other)

Theory

Multiple Linear Regression • Review simple linear regression where we only have one predictor variable. • What if there exist more than one predictor? • Multiple linear regression model • Generalization of linear regression (considering more than one independent variable) i= 1,…,n i= 1, … , n

We fit a model with the form: • i= 1, … , n • k≥2 predictor variables • : k+1 unknown parameters • : is a random error • Note: here “linear” because it is linear in the , not necessarily in the x’s. For example: : may be the salary of the ith person in the sample : years of experience : years of education Graph 1. Regression plane for the model with 2 predictor variables (source of Graph 1: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis)

Here, we assume that the random errors are independent r.v’s. • Yi are independent r.v.’s with i=1,2,…n

Fitting the Multiple Regression Model • Least Squares (LS) Fit estimates of the unknown parameters we minimize We set the first partial derivatives of Q with respect to equal to zero j=0,1,…,k

Simplification leads to the following normal equations: j= 1,2,…k The resulting solutions are the least square ( LS ) estimates of And are denoted by respectively.

Goodness of Fit of the Model • We use the residuals defined by i= 1,2,…, n • Where are the fitted values: i=1,2,…,n • As an overall measure of the goodness of fit, we can use the error sum of squares (which is the minimum value of Q.) We compare this SSE to the total sum of square Define the regression sum of squares given by SSR=SST-SSE The ratio of SSR to SST is called the coefficient of multiple determination: ranges between 0 and 1, values closer to 1 representing better fits. The fact is adding more predictor variables to a model generally increases The positive square root of is the multiple correlation coefficient

Multiple Regression Model in Matrix Notation • Let be the n*1 factors of the r.v.’s, there observes values , and random Errors respectively. n*(k+1) matrix of the values of the Predictor variables.

Let be the (k+1)*1 vectors of unknown parameters and their LS estimates respectively. Then, the model can be written as: The simultaneous linear equation of the normal equations can be written as: If the inverse of matrix X’X exists, then the solution is given by:

Generalized linear model (GLM) The generalized linear model is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. source :http://en.wikipedia.org/wiki/Generalized_linear_model

Statistical inference for Multiple regression First, we assume that Then, in order to determine which predictor variables have statistically significant effects on the response variable, we need to test the hypotheses: If we reject Then, is a significant predictor of y.

It can be shown that each is normally distributed with mean and variance , where is the jth diagonal entry (j=0,1,..,k) of the matrix But how can we get the mean and variance?

Mean: Here, Is unbiased, which is similar to the and in simple linear regression. Then we can get the following:

Variance: From the assumption We can get, Let is the jth diagonal entry (j=0,1,..,k) of the matrix , We can get

Derive the PQ for the inference : The unbiased estimator of the unknown error variance Is given by Here, MSE is error mean square and (n-(k+1)) is the degree freedom.

Let And and are statistically independent. Statistically independent: the occurrence of one event does not affect the outcome of the other event. Recall the definition of t-distribution, we can obtain the pivotal quantity.

Confidence interval: A 100(1-α)% confidence interval on is given by: So, the confidence interval for is: Where

Derivation of the hypothesis test for at Test statistic: Reject if

Another Hypothesis test to determine if the model is useful H0 : the null hypothesis which means that none of the predictors xj is related to y. Ha : indicates that at least one of them is related. The test statistic is Where and

By using the formula Coefficient of multiple determination We have We can see that F is an increasing function of , and in this form is used to test the statistical significance of , which is equivalent to testing H0. Reject H0 if

Extra Sum of Squares Method for Testing Subsets of Parameters Consider the full model: And the partial model: To test whether the full model is significantly better than the partial model, we have

Since SST is fixed regardless of the particular model, We have Numerator m: # of coefficients set to zero. Denominator n-(k+1): the error df for the full model So, The extra sum of squares in the numerator represents the part of the variation in y that is accounted for by regression on the m predictors. Divided by m to get an average contribution per term. The test statistic is Rejects H0 if

ANOVA table Links between ANOVA and extra sum of squares method: Let k=1 and m=k, we have

Prediction of Future Observation Having fitted a multiple regression model, suppose that we want to predict the future value of y for a specified vector of predictor variables (Notice that we have included as the first component of the vector to correspond to the constant term in the model.)

Prediction of Future Observation One way is to estimate by a confidence interval (CI). We already have

Prediction of Future Observation And

Prediction of Future Observation Replacing b by its estimate which has n-(k+1) df. The pivotal quantity is A level C.I for is given by

Prediction of Future Observation Another way is to predict by a prediction interval (PI). We know The error prediction , is the difference between two independent variables with mean And variance

Prediction of Future Observation Replacing by its estimate which has n-(k+1) df. The pivotal quantity is A level C.I for is given by

Residual Analysis Recall that Where H is called the hat matrix

Residual Analysis Standardized residuals are given by Here ith is the ith diagonal element of the Hat Matrix H Large | | values indicate outlier observations.

Residual Analysis Moreover, we conclude the ith observation is influential if

Data Transformation Transformations of the variables (both y and the x’s) are often necessary to satisfy the assumptions of linearity, normality and constant variance. Many seemingly nonlinear models can be written as the multiple linear regression model after making a suitable transformation Example:

Data Transformation We can do the transformation by taking Ln on both sides Then we have Let We now have Which is a good model.

Code and table and graphs

Voting Example

Voting Example • Setup: Data on individual state voting percentages for winners of the last twelve (15) U.S. presidential elections. y = New York voting percentage (‘ny’) x1 = California voting percentage (‘ca’) x2 = South Carolina voting percentage (‘sc’) x3 = Wisconsin voting percentage (‘wi’) • Goal: See if there’s any positive correlation between NY and California’s (two traditionally Democratic states) voting patterns, or a negative correlation between NY and South Carolina’s (one Democratic, one Republican state). • Note: Wisconsin was included as a variable although their traditional stance is (seemingly) more ambiguous.

Source: <http://www.presidency.ucsb.edu/elections.php>

Chapter 11

Chapter 11

Presentation Transcript

CHAPTER 11

Chapter 11

Chapter 11

chapter 11

Chapter 11

Chapter 11

Chapter 11

CHAPTER 11

CHAPTER 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11