1 / 126

中級社會統計

中級社會統計. 第十四講 迴歸診斷. 迴歸分析會在哪裡出錯?. 有了統計軟體之後,因為操作的簡便,複迴歸分析常會被誤用濫用 問題往往在於不了解複迴歸分析背後的假設和可能的問題 問題多數是在作因果推論 純粹用來作預測用的複迴歸模型的問題比較不嚴重 以下根據 Paul Allison (1999). 迴歸分析會在哪裡出錯?. Model specification errors Are important independent variables left out of the model?

corbin
Télécharger la présentation

中級社會統計

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 中級社會統計 第十四講 迴歸診斷 社會統計

  2. 迴歸分析會在哪裡出錯? • 有了統計軟體之後,因為操作的簡便,複迴歸分析常會被誤用濫用 • 問題往往在於不了解複迴歸分析背後的假設和可能的問題 • 問題多數是在作因果推論 • 純粹用來作預測用的複迴歸模型的問題比較不嚴重 • 以下根據Paul Allison (1999) 社會統計

  3. 迴歸分析會在哪裡出錯? • Model specification errors • Are important independent variables left out of the model? • Are irrelevant independent variables included • Does the dependent variable affect any of the independent variables? • How well are the independent variables measured? 社會統計

  4. 迴歸分析會在哪裡出錯? • Is the sample large enough to detect important effects? • Is the sample so large that trivial effects are statistically significant? • Do some variable mediate the effects of other variables? • Are some independent variables too highly correlated? • Is the sample biased 社會統計

  5. 14.1.1 模型設定錯誤-遺漏 Model Specification Errors

  6. 自變數設定錯誤的問題 • 為何複迴歸方程式要放入某一個自變數? • 想要了解這個IV對於DV的影響效果 • 想要控制這個IV • 研究者往往忘了要放入重要的控制變數 • 何謂重要的控制變數? • 對DV有沒有因果效果? • 和我們主要關心的主要變數有沒有相關性? • 相關性? • 如果和其他IV沒有關係,那也不需要控制了。 • 控制是為了分離出淨關係 社會統計

  7. 自變數設定錯誤的問題:遺漏 • 遺漏重要IV的後果 • 迴歸係數會有偏誤,不然就是太高,要不然就是太低。 • 沒有控制,IV和DV之間的關係可能是虛假的spurious 社會統計

  8. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model In this sequence and the next we will investigate the consequences of misspecifying the regression model in terms of explanatory variables. 1

  9. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model To keep the analysis simple, we will assume that there are only two possibilities. Either Y depends only on X2, or it depends on both X2 and X3. 2

  10. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Correct specification, no problems If Y depends only on X2, and we fit a simple regression model, we will not encounter any problems, assuming of course that the regression model assumptions are valid. 3

  11. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Correct specification, no problems Correct specification, no problems Likewise we will not encounter any problems if Y depends on both X2 and X3 and we fit the multiple regression. 4

  12. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Correct specification, no problems Correct specification, no problems In this sequence we will examine the consequences of fitting a simple regression when the true model is multiple. 5

  13. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Correct specification, no problems Correct specification, no problems In the next one we will do the opposite and examine the consequences of fitting a multiple regression when the true model is simple. 6

  14. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Coefficients are biased (in general). Standard errors are invalid. Correct specification, no problems Correct specification, no problems The omission of a relevant explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid. 7

  15. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We will now derive the expression for the bias mathematically. It is convenient to start by deriving an expression for the deviation of Yi about its sample mean. It can be expressed in terms of the deviations of X2, X3, and u about their sample means. 12

  16. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Although Y really depends on X3 as well as X2, we make a mistake and regress Y on X2 only. The slope coefficient is therefore as shown. 13

  17. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We substitute for the Y deviations and simplify. 14

  18. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Hence we have demonstrated that b2 has three components. 15

  19. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE To investigate biasedness or unbiasedness, we take the expected value of b2. The first two terms are unaffected because they contain no random components. Thus we focus on the expectation of the error term. 16

  20. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE X2 is nonstochastic, so the denominator of the error term is nonstochastic and may be taken outside the expression for the expectation. 17

  21. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In the numerator the expectation of a sum is equal to the sum of the expectations (first expected value rule). 18

  22. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In each product, the factor involving X2 may be taken out of the expectation because X2 is nonstochastic. 19

  23. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE By Assumption A.3, the expected value of u is 0. It follows that the expected value of the sample mean of u is also 0. Hence the expected value of the error term is 0. 20

  24. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Thus we have shown that the expected value of b2 is equal to the true value plus a bias term. Note: the definition of a bias is the difference between the expected value of an estimator and the true value of the parameter being estimated. 21

  25. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE As a consequence of the misspecification, the standard errors, t tests and F test are invalid. 22

  26. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ We will illustrate the bias using an educational attainment model. To keep the analysis simple, we will assume that in the true model S教育年數 depends only on ASVABC測驗分數and SM母親教育年數. The output above shows the corresponding regression using EAEF Data Set 21. 23

  27. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ We will run the regression a second time, omitting SM. Before we do this, we will try to predict the direction of the bias in the coefficient of ASVABC. 24

  28. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ It is reasonable to suppose, as a matter of common sense, that b3 is positive. This assumption is strongly supported by the fact that its estimate in the multiple regression is positive and highly significant. 25

  29. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ . cor SM ASVABC (obs=540) | SM ASVABC --------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000 The correlation between ASVABC and SM is positive, so the numerator of the bias term must be positive. The denominator is automatically positive since it is a sum of squares and there is some variation in ASVABC. Hence the bias should be positive. 26

  30. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------ Here is the regression omitting SM. 27

  31. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ . reg S ASVABC ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------ As you can see, the coefficient of ASVABC is indeed higher when SM is omitted. Part of the difference may be due to pure chance, but part is attributable to the bias. 28

  32. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- SM | .3130793 .0348012 9.00 0.000 .2447165 .3814422 _cons | 10.04688 .4147121 24.23 0.000 9.232226 10.86153 ------------------------------------------------------------------------------ Here is the regression omitting ASVABC instead of SM. We would expect b3 to be upwards biased. We anticipate that b2 is positive and we know that both the numerator and the denominator of the other factor in the bias expression are positive. 29

  33. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ . reg S SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- SM | .3130793 .0348012 9.00 0.000 .2447165 .3814422 _cons | 10.04688 .4147121 24.23 0.000 9.232226 10.86153 ------------------------------------------------------------------------------ In this case the bias is quite dramatic. The coefficient of SM has more than doubled. The reason for the bigger effect is that the variation in SM is much smaller than that in ASVABC, while b2 and b3 are similar in size, judging by their estimates. 30

  34. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 . reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 Finally, we will investigate how R2 behaves when a variable is omitted. In the simple regression of S on ASVABC, R2 is 0.34, and in the simple regression of S on SM it is 0.13. 31

  35. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 . reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 Does this imply that ASVABC explains 34% of the variance in S and SM 13%? No, because the multiple regression reveals that their joint explanatory power is 0.35, not 0.47. 32

  36. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 . reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 In the second regression, ASVABC is partly acting as a proxy for SM, and this inflates its apparent explanatory power. Similarly, in the third regression, SM is partly acting as a proxy for ASVABC, again inflating its apparent explanatory power. 33

  37. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596 ------------------------------------------------------------------------------ However, it is also possible for omitted variable bias to lead to a reduction in the apparent explanatory power of a variable. This will be demonstrated using a simple earnings function model, supposing the logarithm of hourly earnings to depend on S and EXP. 34

  38. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596 ------------------------------------------------------------------------------ . cor S EXP (obs=540) | S EXP --------+------------------ S| 1.0000 EXP| -0.2179 1.0000 If we omit EXP from the regression, the coefficient of S should be subject to a downward bias. b3 is likely to be positive. The numerator of the other factor in the bias term is negative since S and EXP are negatively correlated. The denominator is positive. 35

  39. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596 ------------------------------------------------------------------------------ . cor S EXP (obs=540) | S EXP --------+------------------ S| 1.0000 EXP| -0.2179 1.0000 For the same reasons, the coefficient of EXP in a simple regression of LGEARN on EXP should be downwards biased. 36

  40. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | .1235911 .0090989 13.58 0.000 .1057173 .141465 EXP | .0350826 .0050046 7.01 0.000 .0252515 .0449137 _cons | .5093196 .1663823 3.06 0.002 .1824796 .8361596 . reg LGEARN S ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | .1096934 .0092691 11.83 0.000 .0914853 .1279014 _cons | 1.292241 .1287252 10.04 0.000 1.039376 1.545107 . reg LGEARN EXP ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- EXP | .0202708 .0056564 3.58 0.000 .0091595 .031382 _cons | 2.44941 .0988233 24.79 0.000 2.255284 2.643537 As can be seen, the coefficients of S and EXP are indeed lower in the simple regressions. 37

  41. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274 . reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538 .275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539 .34639637 Root MSE = .52475 . reg LGEARN EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538 .338948978 R-squared = 0.0233 -------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539 .34639637 Root MSE = .58219 A comparison of R2 for the three regressions shows that the sum of R2 in the simple regressions is actually less than R2 in the multiple regression. 38

  42. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537 .252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539 .34639637 Root MSE = .50274 . reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538 .275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539 .34639637 Root MSE = .52475 . reg LGEARN EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538 .338948978 R-squared = 0.0233 -------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539 .34639637 Root MSE = .58219 This is because the apparent explanatory power of S in the second regression has been undermined by the downwards bias in its coefficient. The same is true for the apparent explanatory power of EXP in the third equation. 39

  43. 自變數設定錯誤的問題:遺漏 • 正確的迴歸模型為: • Y=α+βX+γZ+ε • 漏掉重要解釋變數Z • Y=α’+β’X+ε’ 社會統計

  44. 自變數設定錯誤的問題:遺漏 社會統計

  45. 自變數設定錯誤的問題:遺漏 社會統計

  46. 自變數設定錯誤的問題:遺漏 社會統計

  47. 自變數設定錯誤的問題:遺漏 社會統計

  48. 自變數設定錯誤的問題:遺漏 社會統計

  49. 自變數設定錯誤的問題:遺漏 社會統計

  50. 14.1.2 模型設定錯誤-加入不相關變數 Model Specification Errors-including irrelevant IV

More Related