1 / 27

Simple linear regression and correlation analysis

Simple linear regression and correlation analysis. Regression Correlation Significance testing. 1. Simple linear regression analysis. Simple regression describes relationship between two variables Two variables, generally Y = f(X) Y = dependent variable (regressand)

kaden-craft
Télécharger la présentation

Simple linear regression and correlation analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simple linear regression and correlation analysis Regression Correlation Significance testing

  2. 1. Simple linear regression analysis • Simple regression describes relationship between two variables • Two variables, generally Y = f(X) • Y = dependent variable (regressand) • X = independent variable (regressor)

  3. Simple linear regression • f (x) – regression equation • ei – random error, residual deviation • independent random quantity • N (0, σ2)

  4. Simple linear regression – straight line • b0 = constant • b1 = coefficient of regression

  5. Parameter estimates →least squares condition • difference of the actual Y from the estimated Y est. is minimal • hence n is number of observations (yi,xi) • adjustment under partial derivation of function according to parameters b0, b1, derivation of the S sum of squared deviationas are equated to zero:

  6. Two approches to parameter estimates with using of least squares condition (made for straight line equation) • Normal equation system for straight line • Matrix computation approach • y = dependent variable vector • X = independent variable matrix • b = vector of regression coefficient (straight line → b0 and b1) • ε = vector of random values

  7. Simple linear regression • observation yi • smoothed values yi est; yi´ • residual deviation • residual sum of squares • residual variance

  8. Simple lin. reg. → dependence Y on X • Straight line equation • Normal equation system • Parameter estimates – computational formula

  9. Simple lin. reg. → dependence X on Y • Associated straight line equation • Parameters estimates – computational formula

  10. 2. Correlation analysis • corr. analysis measures strength of dependence – coeff. of correlation „r“ • │r│is in<0; +1> • │r│is in<0; 0,33> weak dependence • │r│is in<0,34; 0,66> medium strong dependence • │r│is in<0,67; 1> strong to very strong dependence • r2 = coeff. of determination, proportion (%) of variance Y, that is caused by the effect of X

  11. 3. Significance testing in simple regression

  12. Significance test of parameters b1 (straight line) (two-sided) • test criterion • estimate sb for par. b1 • table value (two-sided) if test criterion>table value→H0 is rejected and H1 is valid; if test alfa>p-value→H0 is rejected

  13. Coefficient of regression estimation • interval estimate for the unknown βi

  14. Significance test of coeff. corr. r (straight line) (two-sided) • test criterion table value (two-sided) if test criterion>table value→H0 is rejected and H1 is valid; if test alfa>p-value→H0 is rejected

  15. Coefficient of correlation estimation • small samples and not normal distribution • Fischer Z – transformation • first r is assigned to Z (by tables) • interval estimate for the unknown σ • last step Z1 a Z2 is assigned to r1 a r2

  16. The summary ANOVA

  17. The summary ANOVA (alternatively) • test criterion • table value

  18. Multicollinearity • relationship between (among) independent variables • among independent variables (X1; X2….XN) is almost perfect linear relationship, high multicollinearity • before model formation is needed to analyze of relationship • linear independent of culumns (variables) is disturbed

  19. Causes of multicollinearity • tendencies of time series, similar tendencies among variables (regression) • including of exogenous variables, delay • using 0;1 coding in our sample

  20. Consequences of multicollinearity • wrong sampling • null hypothesis about zero regression coefficient is not rejected, really is rejected • confidence intervals are wide • regression coeff estimation is very influented by data changing • regression coeff can have wrong sign • regression equation is not suitable for prediction

  21. Testing of multicollinearity • Paired coefficient of correlation • t - test • Farrar-Glauber test • test criterion • table value if test criterion>table value→H0 is rejected

  22. Elimination of multicollinearity • variables excluding • get new sample • once again re-formulate and think out the model (chosen variables) • variables transformation – chosen variables recounting (not total consumption, but consumption per capita… etc.)

  23. Regression diagnostics • Data quality for the chosen model • Suitable model for the chosen dataset • Method conditions

  24. Data quality evaluation • A) outlying observation in „y“ set • Studentized residuals |SR| > 2 → outlying observation → outlying need not to be influential (influential has cardinal influence on regression)

  25. Data quality evaluation • B) outlying observation in „x“ set • Hat Diag leverage hii – diagonal values of hat matrix H H = X . (XT . X)-1 . XT hii > → outlying observation

  26. Data quality evaluation • C) influential observation • Cook D (influential obs. influence the whole equation) Di > 4 → influential obs. • Welsch – Kuh DFFITS distance (influential obs. influence smoothed observation) |DFFITS| > → influential obs.

  27. Method condition • regression parameters <-∞; +∞> • regression model is linear in parameters (not linear – data transformation) • independent of residues • normal distribution of residues N(0;σ2)

More Related