1 / 93

6-4 Other Aspects of Regression

6-4 Other Aspects of Regression. 6-4.1 Polynomial Models. 6-4 Other Aspects of Regression. 6-4.1 Polynomial Models. 6-4 Other Aspects of Regression. 6-4.1 Polynomial Models.

myrilla
Télécharger la présentation

6-4 Other Aspects of Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6-4 Other Aspects of Regression 6-4.1 Polynomial Models

  2. 6-4 Other Aspects of Regression 6-4.1 Polynomial Models

  3. 6-4 Other Aspects of Regression 6-4.1 Polynomial Models Suppose that we wanted to test the contribution of the second-order terms to this model. In other words, what is the value of expanding the model to include the additional terms?

  4. 6-4 Other Aspects of Regression Example 6-9 OPTIONSNOOVPNODATENONUMBER; DATA ex69; INPUT YIELD TEMP RATIO; TEMPC=TEMP-1212.5; RATIOC=RATIO-12.444; TEMRATC=TEMPC*RATIOC; TEMPCSQ=TEMPC**2; RATIOCSQ=RATIOC**2; CARDS; 49.0 1300 7.5 50.2 1300 9.0 50.5 1300 11.0 48.5 1300 13.5 47.5 1300 17.0 44.5 1300 23.0 28.0 1200 5.3 31.5 1200 7.5 34.5 1200 11.0 35.0 1200 13.5 38.0 1200 17.0 38.5 1200 23.0 15.0 1100 5.3 17.0 1100 7.5 20.5 1100 11.0 29.5 1100 17.0 PROCREGDATA=EX69; MODEL YIELD= TEMPC RATIOC TEMRATC TEMPCSQ RATIOCSQ/VIF; PLOTR.*P.; TITLE'QUADRATIC REGRESSION MODEL - FULL MODEL'; PROCREGDATA=EX69; MODEL YIELD=TEMPC RATIOC/VIF; PLOTR.*P.; TITLE'LINEAR REGRESSION MODEL - REDUCED MODEL'; RUN; QUIT;

  5. 6-4 Other Aspects of Regression QUADRATIC REGRESSION MODEL - FULL MODEL The REG Procedure Model: MODEL1 Dependent Variable: YIELD Number of Observations Read 16 Number of Observations Used 16 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 2112.33724 422.46745 371.49 <.0001 Error 10 11.37214 1.13721 Corrected Total 15 2123.70937 Root MSE 1.06640 R-Square 0.9946 Dependent Mean 36.10625 Adj R-Sq 0.9920 CoeffVar 2.95351 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 36.43394 0.55288 65.90 <.0001 0 TEMPC 1 0.13048 0.00364 35.83 <.0001 1.13707 RATIOC 1 0.48005 0.05860 8.19 <.0001 1.45205 TEMRATC 1 -0.00733 0.00079928 -9.18 <.0001 1.37367 TEMPCSQ 1 0.00017820 0.00005854 3.04 0.0124 1.20061 RATIOCSQ 1 -0.02367 0.01019 -2.32 0.0425 1.71889

  6. 6-4 Other Aspects of Regression

  7. 6-4 Other Aspects of Regression LINEAR REGRESSION MODEL - REDUCED MODEL The REG Procedure Model: MODEL1 Dependent Variable: YIELD Number of Observations Read 16 Number of Observations Used 16 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1952.97853 976.48926 74.35 <.0001 Error 13 170.73085 13.13314 Corrected Total 15 2123.70937 Root MSE 3.62397 R-Square 0.9196 Dependent Mean 36.10625 Adj R-Sq 0.9072 CoeffVar 10.03695 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 36.10634 0.90599 39.85 <.0001 0 TEMPC 1 0.13396 0.01191 11.25 <.0001 1.05264 RATIOC 1 0.35106 0.16955 2.07 0.0589 1.05264

  8. 6-4 Other Aspects of Regression

  9. 6-4 Other Aspects of Regression 6-4.1 Polynomial Models where: SSER = SSE for Reduced Model SSEF= SSE for Full Model = number of ’s in H0 For given , we reject H0 if: Partial F>tabled F with dof = , numerator , denominator

  10. 6-4 Other Aspects of Regression Example OPTIONS NOOVP NODATE NONUMBER; DATA BIDS; INFILE 'C:\users\myung\Documents\Teaching\학부과목\imen214-stats\bids.dat'; INPUT PRICE QUANTITY BIDS; LOGPRICE=LOG(PRICE); QUANSQ=QUANTITY**2; PROCREG DATA=BIDS; MODEL PRICE= QUANTITY/P CLM CLI XPX; PLOT PRICE*QUANTITY/PRED CONF; /* SCATTER PLOT */ PLOT R.*P.; TITLE 'LINEAR REGRESSION OF PRICE VS. QUANTITY'; PROCREG DATA=BIDS; MODEL LOGPRICE= QUANTITY/P CLM CLI XPX; PLOT LOGPRICE*QUANTITY/PRED CONF; /* SCATTER PLOT */ PLOT R.*P.; TITLE 'LINEAR REGRESSION OF LOGPRICE VS. QUANTITY'; PROCREG DATA=BIDS; MODEL PRICE= QUANTITY QUANSQ/P CLM CLI XPX; PLOT PRICE*QUANTITY; /* SCATTER PLOT */ PLOT R.*P.; TITLE 'QUADRATIC REGRESSION OF PRICE VS. QUANTITY'; RUN; QUIT;

  11. 6-4 Other Aspects of Regression LINEAR REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Intercept QUANTITY PRICE Intercept 30 273 2374.97 QUANTITY 273 3688.98 12492.46 PRICE 2374.97 12492.46 266887.1815 --------------------------------------------------------------------------------------------- LINEAR REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: PRICE Number of Observations Read 30 Number of Observations Used 30 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 69039 69039 196.62 <.0001 Error 28 9831.89259 351.13902 Corrected Total 29 78871 Root MSE 18.73870 R-Square 0.8753 Dependent Mean 79.16567 Adj R-Sq 0.8709 CoeffVar 23.67024 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 148.05523 5.98682 24.73 <.0001 QUANTITY 1 -7.57028 0.53989 -14.02 <.0001

  12. 6-4 Other Aspects of Regression LINEAR REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: PRICE Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 153.3200 140.4849 5.5523 129.1115 151.8584 100.4509 180.5190 12.8351 2 74.1100 93.5492 3.5717 86.2330 100.8654 54.4737 132.6247 -19.4392 3 29.7200 21.6315 5.3423 10.6883 32.5748 -18.2824 61.5455 8.0885 4 54.6700 57.9689 3.7403 50.3072 65.6305 18.8272 97.1105 -3.2989 5 68.3900 77.6516 3.4229 70.6401 84.6631 38.6320 116.6712 -9.2616 6 119.0400 120.0452 4.4949 110.8378 129.2526 80.5718 159.5185 -1.0052 7 116.1400 135.1858 5.2599 124.4114 145.9601 95.3178 175.0537 -19.0458 8 146.4900 147.2982 5.9426 135.1253 159.4711 107.0298 187.5666 -0.8082 9 81.8100 89.0070 3.4925 81.8531 96.1610 49.9616 128.0525 -7.1970 10 19.5800 8.7620 6.0757 -3.6835 21.2076 -31.5897 49.1138 10.8180 11 141.0800 126.1014 4.7863 116.2970 135.9058 86.4846 165.7183 14.9786 12 101.7200 112.4749 4.1651 103.9432 121.0066 73.1537 151.7961 -10.7549 13 24.8800 16.3323 5.6378 4.7838 27.8808 -23.7518 56.4165 8.5477 14 19.4300 8.7620 6.0757 -3.6835 21.2076 -31.5897 49.1138 10.6680 15 39.6300 63.2681 3.6042 55.8853 70.6509 24.1800 102.3561 -23.6381 16 151.1300 135.9428 5.3010 125.0842 146.8013 96.0520 175.8336 15.1872 17 79.1800 92.7922 3.5565 85.5069 100.0774 53.7224 131.8619 -13.6122 18 204.9400 146.5412 5.8985 134.4586 158.6238 106.2999 186.7824 58.3988 19 81.0600 96.5773 3.6396 89.1220 104.0327 57.4755 135.6791 -15.5173 20 37.6200 61.7540 3.6396 54.2987 69.2094 22.6522 100.8558 -24.1340 21 17.1300 -3.3504 6.8070 -17.2939 10.5931 -44.1890 37.4882 20.4804 22 37.8100 46.6135 4.1345 38.1443 55.0826 7.3057 85.9212 -8.8035 23 130.7200 134.4287 5.2190 123.7382 145.1193 94.5833 174.2741 -3.7087 24 26.0700 8.0050 6.1204 -4.5321 20.5422 -32.3750 48.3851 18.0650 25 39.5900 36.7721 4.5657 27.4197 46.1245 -2.7353 76.2795 2.8179 26 66.2000 79.1657 3.4212 72.1576 86.1737 40.1467 118.1847 -12.9657 27 160.2500 129.8866 4.9789 119.6878 140.0853 90.1703 169.6028 30.3634 28 19.3900 17.0894 5.5950 5.6286 28.5501 -22.9696 57.1483 2.3006 29 86.6000 113.9890 4.2276 105.3292 122.6487 74.6397 153.3382 -27.3890 30 47.2700 60.2400 3.6778 52.7063 67.7736 21.1231 99.3568 -12.9700 Sum of Residuals 0 Sum of Squared Residuals 9831.89259 Predicted Residual SS (PRESS) 11542

  13. 6-4 Other Aspects of Regression

  14. 6-4 Other Aspects of Regression

  15. 6-4 Other Aspects of Regression LINEAR REGRESSION OF LOGPRICE VS. QUANTITY The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Intercept QUANTITY LOGPRICE Intercept 30 273 123.86106074 QUANTITY 273 3688.98 990.08475122 LOGPRICE 123.86106074 990.08475122 527.52302023 --------------------------------------------------------------------------------------------- LINEAR REGRESSION OF LOGPRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: LOGPRICE Number of Observations Read 30 Number of Observations Used 30 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 15.59165 15.59165 799.63 <.0001 Error 28 0.54596 0.01950 Corrected Total 29 16.13761 Root MSE 0.13964 R-Square 0.9662 Dependent Mean 4.12870 Adj R-Sq 0.9650 CoeffVar 3.38210 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.16397 0.04461 115.75 <.0001 QUANTITY 1 -0.11377 0.00402 -28.28 <.0001

  16. 6-4 Other Aspects of Regression LINEAR REGRESSION OF LOGPRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: LOGPRICE Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 5.0325 5.0502 0.0414 4.9654 5.1350 4.7519 5.3485 -0.0177 2 4.3056 4.3449 0.0266 4.2903 4.3994 4.0537 4.6360 -0.0393 3 3.3918 3.2641 0.0398 3.1825 3.3456 2.9667 3.5615 0.1277 4 4.0013 3.8102 0.0279 3.7531 3.8673 3.5185 4.1018 0.1912 5 4.2252 4.1059 0.0255 4.0537 4.1582 3.8152 4.3967 0.1193 6 4.7795 4.7430 0.0335 4.6744 4.8116 4.4489 5.0372 0.0364 7 4.7548 4.9706 0.0392 4.8903 5.0509 4.6735 5.2677 -0.2158 8 4.9870 5.1526 0.0443 5.0619 5.2433 4.8525 5.4527 -0.1656 9 4.4044 4.2766 0.0260 4.2233 4.3299 3.9856 4.5676 0.1278 10 2.9745 3.0707 0.0453 2.9779 3.1634 2.7700 3.3714 -0.0962 11 4.9493 4.8340 0.0357 4.7610 4.9071 4.5388 5.1293 0.1153 12 4.6222 4.6293 0.0310 4.5657 4.6928 4.3363 4.9223 -0.007046 13 3.2141 3.1844 0.0420 3.0984 3.2705 2.8858 3.4831 0.0296 14 2.9668 3.0707 0.0453 2.9779 3.1634 2.7700 3.3714 -0.1039 15 3.6796 3.8898 0.0269 3.8348 3.9448 3.5985 4.1811 -0.2102 16 5.0181 4.9819 0.0395 4.9010 5.0629 4.6847 5.2792 0.0362 17 4.3717 4.3335 0.0265 4.2792 4.3878 4.0423 4.6246 0.0382 18 5.3227 5.1412 0.0440 5.0512 5.2313 4.8413 5.4411 0.1815 19 4.3952 4.3904 0.0271 4.3348 4.4459 4.0990 4.6817 0.004827 20 3.6275 3.8670 0.0271 3.8115 3.9226 3.5757 4.1584 -0.2395 21 2.8408 2.8887 0.0507 2.7848 2.9926 2.5843 3.1930 -0.0478 22 3.6326 3.6395 0.0308 3.5764 3.7026 3.3466 3.9324 -0.006937 23 4.8731 4.9592 0.0389 4.8795 5.0389 4.6623 5.2561 -0.0861 24 3.2608 3.0593 0.0456 2.9659 3.1527 2.7584 3.3602 0.2015 25 3.6786 3.4916 0.0340 3.4219 3.5613 3.1972 3.7860 0.1870 26 4.1927 4.1287 0.0255 4.0765 4.1809 3.8379 4.4195 0.0640 27 5.0767 4.8909 0.0371 4.8149 4.9669 4.5950 5.1869 0.1858 28 2.9648 3.1958 0.0417 3.1104 3.2812 2.8973 3.4943 -0.2311 29 4.4613 4.6520 0.0315 4.5875 4.7166 4.3588 4.9452 -0.1907 30 3.8559 3.8443 0.0274 3.7881 3.9004 3.5528 4.1358 0.0116 Sum of Residuals 0 Sum of Squared Residuals 0.54596 Predicted Residual SS (PRESS) 0.63012

  17. 6-4 Other Aspects of Regression

  18. 6-4 Other Aspects of Regression

  19. 6-4 Other Aspects of Regression QUADRATIC REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y Variable Intercept QUANTITY QUANSQ PRICE Intercept 30 273 3688.98 2374.97 QUANTITY 273 3688.98 57017.832 12492.46 QUANSQ 3688.98 57017.832 942040.4526 127145.9652 PRICE 2374.97 12492.46 127145.9652 266887.1815 --------------------------------------------------------------------------------------------- QUADRATIC REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: PRICE Number of Observations Read 30 Number of Observations Used 30 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 74008 37004 205.45 <.0001 Error 27 4862.98599 180.11059 Corrected Total 29 78871 Root MSE 13.42053 R-Square 0.9383 Dependent Mean 79.16567 Adj R-Sq 0.9338 CoeffVar 16.95246 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 169.38879 5.90606 28.68 <.0001 QUANTITY 1 -15.23747 1.51008 -10.09 <.0001 QUANSQ 1 0.39391 0.07500 5.25 <.0001

  20. 6-4 Other Aspects of Regression QUADRATIC REGRESSION OF PRICE VS. QUANTITY The REG Procedure Model: MODEL1 Dependent Variable: PRICE Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 153.3200 154.5452 4.7936 144.7095 164.3809 125.3047 183.7857 -1.2252 2 74.1100 80.0994 3.6195 72.6729 87.5259 51.5789 108.6199 -5.9894 3 29.7200 24.7813 3.8728 16.8349 32.7278 -3.8790 53.4416 4.9387 4 54.6700 43.8449 3.7956 36.0569 51.6328 15.2281 72.4616 10.8251 5 68.3900 61.7498 3.8956 53.7568 69.7429 33.0766 90.4231 6.6402 6 119.0400 118.4028 3.2344 111.7664 125.0392 90.0778 146.7279 0.6372 7 116.1400 144.6235 4.1737 136.0599 153.1871 115.7860 173.4610 -28.4835 8 146.4900 167.8690 5.7838 156.0016 179.7363 137.8840 197.8540 -21.3790 9 81.8100 74.5022 3.7259 66.8572 82.1471 45.9240 103.0803 7.3078 10 19.5800 22.3824 5.0655 11.9889 32.7759 -7.0504 51.8152 -2.8024 11 141.0800 128.5129 3.4586 121.4166 135.6093 100.0766 156.9493 12.5671 12 101.7200 106.4742 3.1943 99.9201 113.0283 78.1683 134.7801 -4.7542 13 24.8800 23.5178 4.2632 14.7704 32.2652 -5.3748 52.4104 1.3622 14 19.4300 22.3824 5.0655 11.9889 32.7759 -7.0504 51.8152 -2.9524 15 39.6300 48.1415 3.8674 40.2062 56.0768 19.4843 76.7987 -8.5115 16 151.1300 146.0172 4.2535 137.2897 154.7448 117.1306 174.9039 5.1128 17 79.1800 79.1469 3.6383 71.6817 86.6120 50.6162 107.6775 0.0331 18 204.9400 166.3570 5.6639 154.7357 177.9784 136.4685 196.2455 38.5830 19 81.0600 83.9885 3.5410 76.7229 91.2541 55.5095 112.4676 -2.9285 20 37.6200 46.8745 3.8496 38.9757 54.7733 18.2274 75.5216 -9.2545 21 17.1300 22.2044 6.8875 8.0724 36.3365 -8.7469 53.1557 -5.0744 22 37.8100 35.9376 3.5916 28.5683 43.3069 7.4320 64.4433 1.8724 23 130.7200 143.2376 4.0968 134.8317 151.6435 114.4465 172.0287 -12.5176 24 26.0700 22.3122 5.1608 11.7231 32.9013 -7.1903 51.8147 3.7578 25 39.5900 30.5186 3.4799 23.3784 37.6588 2.0712 58.9659 9.0714 26 66.2000 63.3477 3.8824 55.3817 71.3138 34.6820 92.0135 2.8523 27 160.2500 135.0878 3.7008 127.4944 142.6812 106.5234 163.6522 25.1622 28 19.3900 23.6747 4.1986 15.0598 32.2896 -5.1781 52.5275 -4.2847 29 86.6000 108.7969 3.1850 102.2617 115.3320 80.4954 137.0984 -22.1969 30 47.2700 45.6390 3.8296 37.7814 53.4967 17.0032 74.2748 1.6310 Sum of Residuals 0 Sum of Squared Residuals 4862.98599 Predicted Residual SS (PRESS) 6362.25589

  21. 6-4 Other Aspects of Regression

  22. 6-4 Other Aspects of Regression

  23. 6-4 Other Aspects of Regression 6-4.2 Categorical Regressors • Many problems may involve qualitative or categorical variables. • The usual method for the different levels of a qualitative variable is to use indicator variables. • For example, to introduce the effect of two different operators into a regression model, we could define an indicator variable as follows:

  24. 6-4 Other Aspects of Regression Example 6-10 Y=gas mileage, x1=engine displacement, x2=horse power x3=0 if automatic transmission 1 if manual transmission if automatic (x3=0), then if manual (x3=1), then +)+ It is unreasonable because x1, x2 effects to x3 are not involved in the model Interaction model: if automatic (x3=0), then if manual (x3=1), then

  25. 6-4 Other Aspects of Regression Dummy Variables Many times a qualitative variable seems to be needed in a regression model. This can be accomplished by creating dummy variables or indicator variables. If a qualitative variable has levels you will need dummy variables. Notice that in ANOVA if a treatment had levels it had degrees of freedom. The ith dummy variable is defined as This can be done automatically in PROC GLM by using the CLASSS statement as we did in ANOVA. Any dummy variables defined with respect to a qualitative variable must be treated as a group. Individual t-tests are not meaningful. Partial F-tests must be performed on the group of dummy variables.

  26. 6-4 Other Aspects of Regression OPTIONSNOOVPNODATENONUMBER; DATAEX611; INPUT FORM SCENT COLOR RESIDUE REGION QUALITY; IF REGION=1THEN REGION1=0; ELSE REGION1=1; FR=FORM*REGION1; RR=RESIDUE*REGION1; CARDS; 6.3 5.3 4.8 3.1 1 91 4.4 4.9 3.5 3.9 1 87 3.9 5.3 4.8 4.7 1 82 5.1 4.2 3.1 3.6 1 83 5.6 5.1 5.5 5.1 1 83 4.6 4.7 5.1 4.1 1 84 4.8 4.8 4.8 3.3 1 90 6.5 4.5 4.3 5.2 1 84 8.7 4.3 3.9 2.9 1 97 8.3 3.9 4.7 3.9 1 93 5.1 4.3 4.5 3.6 1 82 3.3 5.4 4.3 3.6 1 84 5.9 5.7 7.2 4.1 2 87 7.7 6.6 6.7 5.6 2 80 7.1 4.4 5.8 4.1 2 84 5.5 5.6 5.6 4.4 2 84 6.3 5.4 4.8 4.6 2 82 4.3 5.5 5.5 4.1 2 79 4.6 4.1 4.3 3.1 2 81 3.4 5.0 3.4 3.4 2 83 6.4 5.4 6.6 4.8 2 81 5.5 5.3 5.3 3.8 2 84 4.7 4.1 5.0 3.7 2 83 4.1 4.0 4.1 4.0 2 80 PROCREGDATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1/R; TITLE'MODEL WITH DUMMY VARIABLE'; PROCREGDATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1 FR RR/R; TITLE'INTERACTION MODEL WITH DUMMY VARIABLE'; RUN; QUIT; Example 6-11

  27. 6-4 Other Aspects of Regression MODEL WITH DUMMY VARIABLE The REG Procedure Model: MODEL1 Dependent Variable: QUALITY Number of Observations Read 24 Number of Observations Used 24 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 339.74858 113.24953 23.05 <.0001 Error 20 98.25142 4.91257 Corrected Total 23 438.00000 Root MSE 2.21643 R-Square 0.7757 Dependent Mean 84.50000 Adj R-Sq 0.7420 CoeffVar 2.62300 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 89.80615 2.99018 30.03 <.0001 FORM 1 1.81923 0.32599 5.58 <.0001 RESIDUE 1 -3.37945 0.68582 -4.93 <.0001 REGION1 1 -3.40619 0.91941 -3.70 0.0014

  28. 6-4 Other Aspects of Regression INTERACTION MODEL WITH DUMMY VARIABLE The REG Procedure Model: MODEL1 Dependent Variable: QUALITY Number of Observations Read 24 Number of Observations Used 24 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 342.36626 68.47325 12.89 <.0001 Error 18 95.63374 5.31299 Corrected Total 23 438.00000 Root MSE 2.30499 R-Square 0.7817 Dependent Mean 84.50000 Adj R-Sq 0.7210 CoeffVar 2.72780 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 88.25694 4.83957 18.24 <.0001 FORM 1 1.98252 0.42919 4.62 0.0002 RESIDUE 1 -3.21529 0.95251 -3.38 0.0034 REGION1 1 -1.70670 6.57164 -0.26 0.7980 FR 1 -0.64190 0.94343 -0.68 0.5049 RR 1 0.43032 1.89360 0.23 0.8228

  29. 6-4 Other Aspects of Regression Example OPTIONSNOOVPNODATENONUMBER; DATA appraise; INPUT price units age size parking area cond$ @@; IF COND='F'THEN COND1=1; ELSE COND1=0; IF COND='G'THEN COND2=1; ELSE COND2=0; CARDS; 90300 4 82 4635 0 4266 F 384000 20 13 17798 0 14391 G 157500 5 66 5913 0 6615 G 676200 26 64 7750 6 34144 E 165000 5 55 5150 0 6120 G 300000 10 65 12506 0 14552 G 108750 4 82 7160 0 3040 G 276538 11 23 5120 0 7881 G 420000 20 18 11745 20 12600 G 950000 62 71 21000 3 39448 G 560000 26 74 11221 0 30000 G 268000 13 56 7818 13 8088 F 290000 9 76 4900 0 11315 E 173200 6 21 5424 6 4461 G 323650 11 24 11834 8 9000 G 162500 5 19 5246 5 3828 G 353500 20 62 11223 2 13680 F 134400 4 70 5834 0 4680 E 187000 8 19 9075 0 7392 G 93600 4 82 6864 0 3840 F 110000 4 50 4510 0 3092 G 573200 14 10 11192 0 23704 E 79300 4 82 7425 0 3876 F 272000 5 82 7500 0 9542 E PROCREGDATA=APPRAISE; MODEL PRICE=UNITS AGE AREA COND1 COND2/R; TITLE'REDUCED MODEL WITH DUMMY VARIABLE'; RUN; QUIT;

  30. 6-4 Other Aspects of Regression REDUCED MODEL WITH DUMMY VARIABLE The REG Procedure Model: MODEL1 Dependent Variable: price Number of Observations Read 24 Number of Observations Used 24 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 1.040122E12 2.080244E11 253.02 <.0001 Error 18 14799036255 822168681 Corrected Total 23 1.054921E12 Root MSE 28673 R-Square 0.9860 Dependent Mean 296193 Adj R-Sq 0.9821 CoeffVar 9.68067 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 176940 24044 7.36 <.0001 units 1 7256.44825 1185.16990 6.12 <.0001 age 1 -1155.95291 258.66820 -4.47 0.0003 area 1 11.86451 1.54837 7.66 <.0001 COND1 1 -61240 22434 -2.73 0.0138 COND2 1 -61572 18756 -3.28 0.0041

  31. 6-4 Other Aspects of Regression REDUCED MODEL WITH DUMMY VARIABLE The REG Procedure Model: MODEL1 Dependent Variable: price Output Statistics Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual ResidualResidual -2-1 0 1 2 D 1 90300 100552 13370 -10252 25365 -0.404 | | | 0.008 2 384000 416212 11510 -32212 26262 -1.227 | **| | 0.048 3 157500 153841 10976 3659 26489 0.138 | | | 0.001 4 676200 696729 18464 -20529 21938 -0.936 | *| | 0.103 5 165000 160684 9531 4316 27043 0.160 | | | 0.001 6 300000 285448 13119 14552 25496 0.571 | |* | 0.014 7 108750 85674 14143 23076 24943 0.925 | |* | 0.046 8 276538 262106 9519 14432 27047 0.534 | |* | 0.006 9 420000 389183 11460 30817 26284 1.172 | |** | 0.044 10 950000 951227 26362 -1227 11278 -0.109 | | | 0.011 11 560000 574431 19411 -14431 21104 -0.684 | *| | 0.066 12 268000 241261 13868 26739 25097 1.065 | |** | 0.058 13 290000 288643 14681 1357 24630 0.0551 | | | 0.000 14 173200 187560 10295 -14360 26762 -0.537 | *| | 0.007 15 323650 274227 9146 49423 27176 1.819 | |*** | 0.062 16 162500 175105 10710 -12605 26598 -0.474 | | | 0.006 17 353500 351466 14254 2034 24879 0.0817 | | | 0.000 18 134400 180575 17168 -46175 22966 -2.011 | ****| | 0.377 19 187000 239159 10122 -52159 26827 -1.944 | ***| | 0.090 20 93600 95497 13315 -1897 25395 -0.0747 | | | 0.000 21 110000 123281 9601 -13281 27018 -0.492 | | | 0.005 22 573200 548207 20456 24993 20093 1.244 | |** | 0.267 23 79300 95924 13318 -16624 25393 -0.655 | *| | 0.020 24 272000 231646 15028 40354 24420 1.653 | |*** | 0.172 Sum of Residuals 0 Sum of Squared Residuals 14799036255 Predicted Residual SS (PRESS) 26406332001

  32. 6-4 Other Aspects of Regression Analysisof Covariate Suppose we have the following setup. 3 2 3 2 3 4 2 2 3 Y 2 3 2 1 3 4 4 2 3 1 3 1 4 1 4 4 4 4 1 1 1 1 X Suppose X and Y are linearly related. We are interested in comparing the means of Y at the different levels of the treatment. Suppose a plot of the data looks like the following.

  33. 6-4 Other Aspects of Regression Why Use Covariates? Concomitant variables or covariates are used to adjust for factors that influence the Y measurements. In randomized block designs, we did the same thing, but there we could control the value of the block variable. Now we assume we can measure the variable, but not control it. The plot on the previous page demonstrates why we need covariates in some situations. If the covariate (X) was ignored we would most likely conclude that treatment level 3 resulted in a larger mean than 1 and 4 but not different from 2. If the linear relation is extended we see that the value of Y in level 3 could very well be less than that of 1, nearly equal to that of 4 and surely less than that of 2. One assumption we need, equivalent to the no interaction assumption in two-way ANOVA, is that the slopes of the linear relationship between X and Y is the same in each treatment level.

  34. 6-4 Other Aspects of Regression Checking for Equal Slopes The Model we fit first Treatment = 1 Y-intercept = slope= Treatment = r1 Y-intercept = slope= Treatment = r Y-intercept = slope= The test of equal slopes is If we fail to reject this we return the model without the interaction term and test without the interaction term and test

  35. 6-4 Other Aspects of Regression EXAMPLE Four different formulations of an industrial glue are being tested. The tensile strength of the glue is also related to the thickness. Five observations on strength (Y) and thickness (X) in 0.01 inches are obtained for each formulation. The data are shown in the following table.

  36. 6-4 Other Aspects of Regression Example OPTIONSNOOVPNODATENONUMBER; DATA GLUE; INPUT FORMULA STRENGTH THICK @@; CARDS; 1 46.5 13 1 45.9 14 1 49.8 12 1 46.1 12 1 44.3 14 2 48.7 12 2 49.0 10 2 50.1 11 2 48.5 12 2 45.2 14 3 46.3 15 3 47.1 14 3 48.9 11 3 48.2 11 3 50.3 10 4 44.7 16 4 43.0 15 4 51.0 10 4 48.1 12 4 48.6 11 PROCGLMDATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK FORMULA*THICK; OUTPUTOUT=OUT1 P=PRED R=RESID; TITLE'ANALYSIS OF COVARIANCE WITH INTERACTION'; PROCPLOTDATA=OUT1; PLOT STRENGTH*THICK=FORMULA; PLOT PRED*THICK=FORMULA; PLOT RESID*PRED=FORMULA; PROCGLMDATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK/SOLUTION; OUTPUTOUT=OUT2 P=PRED R=RESID; LSMEANS FORMULA/PDIFFSTDERR; TITLE'ANALYSIS OF COVARIANCE WITHOUT INTERACTION'; PROCPLOTDATA=OUT2; PLOT STRENGTH*THICK=FORMULA; PLOT PRED*THICK=FORMULA; PLOT RESID*PRED=FORMULA; RUN; QUIT;

  37. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITH INTERACTION The GLM Procedure Dependent Variable: STRENGTH Sum of Source DF Squares Mean Square F Value Pr > F Model 7 74.01777794 10.57396828 7.22 0.0016 Error 12 17.56772206 1.46397684 Corrected Total 19 91.58550000 R-Square CoeffVar Root MSE STRENGTH Mean 0.808182 2.546457 1.209949 47.51500 Source DF Type I SS Mean Square F Value Pr > F FORMULA 3 11.05750000 3.68583333 2.52 0.1076 THICK 1 59.56576027 59.56576027 40.69 <.0001 THICK*FORMULA 3 3.39451766 1.13150589 0.77 0.5312 Source DF Type III SS Mean Square F Value Pr > F FORMULA 3 2.80437055 0.93479018 0.64 0.6046 THICK 1 41.34340945 41.34340945 28.24 0.0002 THICK*FORMULA 3 3.39451766 1.13150589 0.77 0.5312

  38. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITH INTERACTION PRED*THICK 도표. 기호: FORMULA 의 값. PRED | 51 + | | | | | 2 | 50 + | | 3 | | | | 2 49 + 3 | | | | | | 2 48 + 1 | | | | | | 47 + 3 | | | 1 | | 3 | 46 + 2 | | | | | | 1 45 + | | | 4 | | | 44 + | | | | 4 | | 43 + ---+-------------+-------------+-------------+-------------+-------------+-------------+-- 10 11 12 13 14 15 16 THICK NOTE: 7 개의 관측치가 숨겨져 있습니다. ANALYSIS OF COVARIANCE WITH INTERACTION STRENGTH*THICK 도표. 기호: FORMULA 의 값. STRENGTH | 51 +4 | | | | |3 | 2 50 + | 1 | | | | | 49 +2 | 3 | 2 | 4 2 | | | 3 4 48 + | | | | | | 3 47 + | | | 1 | | 3 | 1 46 + | 1 | | | | | 2 45 + | | 4 | | | 1 | 44 + | | | | | | 43 + 4 -+-------------+-------------+-------------+-------------+-------------+-------------+ 10 11 12 13 14 15 16 THICK

  39. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITHOUT INTERACTION The GLM Procedure Dependent Variable: STRENGTH Sum of Source DF Squares Mean Square F Value Pr > F Model 4 70.62326027 17.65581507 12.63 0.0001 Error 15 20.96223973 1.39748265 Corrected Total 19 91.58550000 R-Square CoeffVar Root MSE STRENGTH Mean 0.771118 2.487955 1.182152 47.51500 Source DF Type I SS Mean Square F Value Pr > F FORMULA 3 11.05750000 3.68583333 2.64 0.0876 THICK 1 59.56576027 59.56576027 42.62 <.0001 Source DF Type III SS Mean Square F Value Pr > F FORMULA 3 1.77104066 0.59034689 0.42 0.7397 THICK 1 59.56576027 59.56576027 42.62 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 60.00712329 B 2.04941586 29.28 <.0001 FORMULA 1 -0.35801370 B 0.74829823 -0.48 0.6392 FORMULA 2 0.21006849 B 0.76349365 0.28 0.7870 FORMULA 3 0.47404110 B 0.75339742 0.63 0.5387 FORMULA 4 0.00000000 B . . . THICK -1.00993151 0.15469162 -6.53 <.0001 NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

  40. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITHOUT INTERACTION The GLM Procedure Least Squares Means STRENGTH Standard LSMEAN FORMULA LSMEAN Error Pr > |t| Number 1 47.0754623 0.5354766 <.0001 1 2 47.6435445 0.5381512 <.0001 2 3 47.9075171 0.5300869 <.0001 3 4 47.4334760 0.5314395 <.0001 4 Least Squares Means for effect FORMULA Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: STRENGTH i/j 1 2 3 4 1 0.4722 0.2895 0.6392 2 0.4722 0.7298 0.7870 3 0.2895 0.7298 0.5387 4 0.6392 0.7870 0.5387 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.

  41. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITHOUT INTERACTION STRENGTH*THICK 도표. 기호: FORMULA 의 값. STRENGTH | 51 +4 | | | | |3 | 2 50 + | 1 | | | | | 49 +2 | 3 | 2 | 4 2 | | | 3 4 48 + | | | | | | 3 47 + | | | 1 | | 3 | 1 46 + | 1 | | | | | 2 45 + | | 4 | | | 1 | 44 + | | | | | | 43 + 4 -+-------------+-------------+-------------+-------------+-------------+-------------+ 10 11 12 13 14 15 16 THICK ANALYSIS OF COVARIANCE WITHOUT INTERACTION PRED*THICK 도표. 기호: FORMULA 의 값. PRED | 51 + | | | | 3 | | 2 50 + | 4 | | | 3 | | 2 49 + | 4 | | | | | 2 48 + | 4 | | 1 | | | 47 + | | | 1 | | 3 | 2 46 + | | | 1 | | 3 | 45 + | 4 | | | | | 44 + | 4 | | | | | 43 + ---+-------------+-------------+-------------+-------------+-------------+-------------+-- 10 11 12 13 14 15 16 THICK NOTE: 4 개의 관측치가 숨겨져 있습니다.

  42. 6-4 Other Aspects of Regression ANALYSIS OF COVARIANCE WITH INTERACTION RESID*PRED 도표. 기호: FORMULA 의 값. RESID | 2.0 + | | 1 | | | | 1.5 + | | 4 | | | | 1.0 + | 2 | | 1 | | 2 3 4 | 0.5 + | 2 | | | | 3 | 3 4 0.0 + 1 | 3 | | | | | -0.5 + | 4 | | 2 | 1 3 | | -1.0 + | | | 2 | | | -1.5 + 4 | | | | | 1 | -2.0 + ---+---------+---------+---------+---------+---------+---------+---------+---------+-- 43 44 45 46 47 48 49 50 51 PRED ANALYSIS OF COVARIANCE WITHOUT INTERACTION RESID*PRED 도표. 기호: FORMULA 의 값. RESID | | 2.5 + | | | 1 | | 2.0 + | | | | | 1.5 + | | | | | 4 1.0 + 3 2 | | 4 | 3 | | 2 0.5 + | 1 2 | | 4 | | 0.0 + 1 | 3 | | | 4 | -0.5 + 3 | | | | | 2 -1.0 + | 2 | 3 | 1 | | 1 -1.5 + | | | | 4 | -2.0 + | ---+---------+---------+---------+---------+---------+---------+---------+---------+-- 43 44 45 46 47 48 49 50 51 PRED

  43. 6-4 Other Aspects of Regression 6-4.3 Variable Selection Procedures Best Subsets Regressions Selection Techniques R2 MSE Cp

  44. 6-4 Other Aspects of Regression 6-4.3 Variable Selection Procedures Backward Elimination all regressors in the model t-test: smallest absolute t-value eliminated first Minitab for cut-off form, residue, region

  45. 6-4 Other Aspects of Regression 6-4.3 Variable Selection Procedures Forward Selection No regressors in the model largest absolute t-value added first Minitab for cut-off form, residue, region, scent

  46. 6-4 Other Aspects of Regression 6-4.3 Variable Selection Procedures Stepwise Regression begins with forward step, then backward elimination tin=tout Minitab for cut-off form, residue, region

  47. 6-4 Other Aspects of Regression Example OPTIONS NODATE NOOVP NONUMBER; DATA SALES; INFILE 'C:\users\myung\Documents\Teaching\학부과목\imen214-stats\ch06\sales.dat'; INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; PROCCORR DATA=SALES; VAR SALES; WITH TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS OF DEPENDENT WITH INDENDENTS'; PROCCORR DATA=SALES; VAR TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS BETWEEN INDEPENDENT VARIABLES'; PROCREG DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/VIF R; TITLE 'REGRESSION MODEL WITH ALL VARIABLES'; PROCRSQUARE DATA=SALES CP; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/ADJRSQ RMSE SSE; TITLE 'ALL POSSIBLE REGRESSIONS'; PROCSTEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/FORWARD; PROCSTEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/BACKWARD; TITLE 'STEPWISE REGRESSION USING BACKWARD ELIMINATION'; PROCSTEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'STEPWISE REGRESSION THE STEPWISE TECHNIQUE'; PROCREG DATA=SALES; MODEL SALES=POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=POTENT ADVERT SHARE CHANGE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE/R; MODEL SALES=TIME POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE WORKLOAD/R; RUN; QUIT;

  48. 6-4 Other Aspects of Regression Example CORRELATIONS OF DEPENDENT WITH INDENDENTS 피어슨상관 계수, N = 25 H0: Rho=0 가정하에서 Prob > |r| SALES TIME 0.62292 0.0009 POTENT 0.59781 0.0016 ADVERT 0.59618 0.0017 SHARE 0.48351 0.0143 CHANGE 0.48014 0.0151 ACCOUNTS 0.75399 <.0001 WORKLOAD -0.11722 0.5768 RATING 0.40188 0.0464 CORRELATIONS BETWEEN INDEPENDENT VARIABLES 피어슨상관 계수, N = 25 H0: Rho=0 가정하에서 Prob > |r| TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING TIME 1.00000 0.45397 0.24919 0.10621 0.27512 0.75782 -0.17932 0.10113 0.0226 0.2297 0.6133 0.1832 <.0001 0.3911 0.6305 POTENT 0.45397 1.00000 0.17410 -0.21067 0.22570 0.47864 -0.25884 0.35870 0.0226 0.4052 0.3121 0.2780 0.0155 0.2115 0.0783 ADVERT 0.24919 0.17410 1.00000 0.26446 0.34826 0.20004 -0.27223 0.41146 0.2297 0.4052 0.2014 0.0880 0.3377 0.1880 0.0410 SHARE 0.10621 -0.21067 0.26446 1.00000 0.14686 0.40301 0.34935 -0.02356 0.6133 0.3121 0.2014 0.4836 0.0458 0.0870 0.9110 CHANGE 0.27512 0.22570 0.34826 0.14686 1.00000 0.32344 -0.29839 0.49418 0.1832 0.2780 0.0880 0.4836 0.1148 0.1474 0.0120 ACCOUNTS 0.75782 0.47864 0.20004 0.40301 0.32344 1.00000 -0.19885 0.22861 <.0001 0.0155 0.3377 0.0458 0.1148 0.3406 0.2717 WORKLOAD -0.17932 -0.25884 -0.27223 0.34935 -0.29839 -0.19885 1.00000 -0.27691 0.3911 0.2115 0.1880 0.0870 0.1474 0.3406 0.1802 RATING 0.10113 0.35870 0.41146 -0.02356 0.49418 0.22861 -0.27691 1.00000 0.6305 0.0783 0.0410 0.9110 0.0120 0.2717 0.1802

  49. 6-4 Other Aspects of Regression Example REGRESSION MODEL WITH ALL VARIABLES The REG Procedure Model: MODEL1 Dependent Variable: SALES Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 38108913 4763614 23.30 <.0001 Error 16 3270636 204415 Corrected Total 24 41379549 Root MSE 452.12251 R-Square 0.9210 Dependent Mean 3374.56760 Adj R-Sq 0.8814 CoeffVar 13.39794 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation Intercept 1 -1642.62908 768.07059 -2.14 0.0482 0 TIME 1 1.66830 1.97176 0.85 0.4100 3.43888 POTENT 1 0.03684 0.00826 4.46 0.0004 1.97767 ADVERT 1 0.15912 0.04722 3.37 0.0039 1.89317 SHARE 1 183.52338 68.76742 2.67 0.0168 3.35939 CHANGE 1 289.66240 196.48626 1.47 0.1598 1.55819 ACCOUNTS 1 6.49811 4.81615 1.35 0.1960 5.65732 WORKLOAD 1 25.67997 34.65316 0.74 0.4694 1.89904 RATING 1 15.01902 128.57870 0.12 0.9085 1.78590

  50. 6-4 Other Aspects of Regression All Possible Regressions This is the brute force method of modeling. It is feasible if the number of independent variables is small (less than 10 or so) and the sample size is not too large. Some of the common quantities to look at are R-square should be large. Should be adequate increase when an additional variable is added. Adj R-square should not be much less than R-square. It should show an increase if a variable is added. Mallows Cp should be approximately the number of parameters in the model (including the y-intercept). This is a good measure to use to narrow down the possible models quickly, then use 1) and 2) to pick the final models. The model should make sense. Note: Many of the better methods of model selection are to time consuming to use on all possible regressions. A number of good models can be chosen and then use better methods.

More Related