1 / 19

Chapter 3 Section 3 Multicollinearity

Chapter 3 Section 3 Multicollinearity. mxl. 3.1 The Background and Causes of Multicollinearity.

belva
Télécharger la présentation

Chapter 3 Section 3 Multicollinearity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Section 3Multicollinearity mxl

  2. 3.1 The Background and Causes of Multicollinearity Usually, it is rare that explanatory variables are completely unrelated to the case,especially when studying a certain economic issues which involving a lot of independent variables .At that time ,it is very difficult to find a group of independent variables which unrelated to each other, but are all have significant influence on the dependent variable. Objectively speaking, when we talk about an economic phenomenon which involving multiple factors, we may find out that most of the factors may have some relevance .So,when there are only weak correlation between them, we generally considered it meets the requirements of the linear regression model.

  3. Example1: • To study the status of China's consumer ,we may talk about the the average wage of workers, peasants per capita income, bank interest rates, the national retail price index, bond interest rates, currency circulation, savings, pre-spending, etc.. We may find out that there are a strong correlation among them.

  4. Example 2 • To establish a regression model about the food production in one region,suppose food production as the dependent variable Y , fertilizer usage as independent variable x1, irrigated area as independent variable x2, in order to finance agricultural inputs as independent variable x3, • Then we may find out that agricultural inputs has a strong correlation with fertilizer use, irrigated area

  5. 3.2 The Impact of Multicollinearity on Linear Regression Model • Suppose the linear regression model has the the problem of multicollinearity,it also means that as to a designed matrix X, there is column vector c0,c1,c2……..,cp which not equal to 0,making • c0+c1xi1+c2xi2+…….+cpxip=0, I=1,2,3,…,n • This show that there are complete collinearity between variables ,it also implies that a certain or some other explanatory variables can be written as an exact linear combinations of other explanatory variables,namely, a precise linear function.

  6. At this point, the design matrix X will no longer be column full rank, its rank will be less than p +1, and thus matrix will be singular, and its inverse matrix does not exist. Therefore, the least squares regression parameter estimates were not substantiated . • From the following discussion about the binary regression ,we can we can see that when the correlation between the independent variables increases, the variance of estimators will increase rapidly.

  7. We establish a model with Y and the independent variables X1,X2. • Suppose that y and x1, x2 have been centralized, at this time the constant term in this model is zero. • The equation is • Provide that : Then the correlation coefficient between x1,x2 is

  8. The Covariance matrix of is So we can get that : In conclusion , when the correlation between the independent variables increases, the variance of estimators will increase rapidly.

  9. 3.3 The Diagnosis of Multi-collinearity • Under normal circumstances , when there is a strong linear relationship between the independent variables in one regression model,we may find out that the regression equation the test is highly significant ,but some independent variables which have high relationship with dependent variable y can not pass the significance test , even some regression coefficients are inconsistent with the actual situation , at this time we consider there exists multi-collinearity between independent variables.

  10. 3.3.1 Correlation Coefficient Measurement Method • The simple correlation coefficient r between the two variables is an important indicator to measure the linear correlation. Therefore ,it can be used to measure collinearity problem between the independent variables. • When the correlation coefficient of the two independent variables r2 , when above 0.9, the collinearity problem between the independent variables would be serious. • If the correlation coefficient of the two explanatory variables r2 is greater than determine coefficient R2 between the dependent variables and independent variables, the collinearity among the independent variables is harmful.

  11. 3.3.2 The Method of Variance Inflation Factors • We establish models between Xj and other independent variablesX1,…Xj-1,Xj+1,…,Xk,and calculate the coefficient of determination . • We construct an indicator • This indicator is called the variance inflation factor. If the independent variables Xj are not related to with the rest of independent variables , then the coefficient in auxiliary regression equation of determination is equal to 0, the variance inflation factor of 1, indicating there is no multicollinearity among the independent variables .On the contrary ,there is multiple collinearity among the variables. In general,if VIF is greater than 5 ,it shows there is serious multicollinearity problem among the independent variables. .

  12. 3.3.3 Characteristic Root Test • According to the nature of the matrix determinant, matrix determinant is equal to the product of its characteristic roots. Thus, when the determinant | X'X | ≈ 0, the matrix X'X has at least one characteristic root which is approximately to zero. • On the contrary ,it can be shown that when the matrix X'X has at least one zero eigenvalue approximation,there must has multicollinearity between column vectors. (The certification process can be found in "Applied Regression Analysis" P161)

  13. We assume that the largest characteristic root of is ,then we call is the condition number of i=0,1,2,…p • The condition number measure of the degree of matrix eigenvalue spread, it can determine the existence of multi-collinearity and multicollinearity • Generally ,if 1<k <100, the design matrix X is not more than re-collinearity; • If 10 ≤ k <100,we say there is a strong multi-collinearity in X; • If k ≥ 100, we say that there is a serious multi-collinearity among the independent variables.

  14. 3.4 the Method to Eliminate Multi-collinearity • 3.4.1 Increasing the Sample Size • If there are little data to establish one regression model , it is easy to have multicollinearity problem among the independent variables. • For example ,in one simple regression model ,we suppose the two independent variables x1,x2 have been centralized, provide that : • Given that :r12 is the correlation coefficient of x1 and x2

  15. From above ,we can find out that if r12 is fixed , when the sample size n increases, L11 and L22 will be increased, the two variances can be reduced, the multicollinearity problem will be weaken. Therefore, increasing the sample size is one way of eliminating multi-collinearity. • So when we confront with the real economic problem ,we may choose the sample size n much larger than the number of independent variables p.

  16. 3.4.2 Using the Prior Information of Non-sample Data • Turn to P75 for detailed information

  17. 3.4.3 Eliminating Some Unimportant Independent Variables • When we establish a modeling on economic issues, with the limitations of standard, we may considering too many independent variables which will have multicollinearity among them. At this time, we must remove some less important variables in this model. • Methods is : we first remove the variable which has the biggest variance expansion factor ,and then re-establish the equation.If there are still multicollinearity, then continue excluding those independent variables which have greatest variance, until there is no multicollinearity problem in the equation.

  18. 3.4.4 Using Regression Coefficients Biased on Estimates - ridge Regression • Although the best solution to the problem of multiple collinearity is to increase the sample size or use more non-sample information, but it is often difficult to do it in practice. Taking into account multicollinearity exists among the independent variables , the least squares estimator of regression coefficients is not accurate. • Ridge regression estimator is the improvement of least squares estimator .

  19. This estimator is called ridge regression estimator, where is a constant, called the bias parameter. If , ridge estimator is equal to the ordinary least squares estimator • Ridge regression estimator is biased, but it can be shown that the variance of ridge regression estimator is smaller than that of the least squares estimator. In fact, the bias and variance of in size were determined by the value of coefficient bias . • The larger of , the greater of deviation in , the smaller the variance. • Therefore, the choice of is based on consideration of both the bias of and variance.

More Related