1 / 52

Regression: (1) Simple Linear Regression

Regression: (1) Simple Linear Regression. Hal Whitehead BIOL4062 / 5062. Regression. Purposes of regression Simple linear regression Formula Assumptions If assumptions hold, what can we do? Testing assumptions When assumptions do not hold. Regression. One Dependent Variable Y

gabriella
Télécharger la présentation

Regression: (1) Simple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression:(1) Simple Linear Regression Hal Whitehead BIOL4062 / 5062

  2. Regression • Purposes of regression • Simple linear regression • Formula • Assumptions • If assumptions hold, what can we do? • Testing assumptions • When assumptions do not hold

  3. Regression One Dependent Variable Y Independent Variables X1,X2,X3,...

  4. Purposes of Regression 1. Relationship between Y and X's 2. Quantitative prediction of Y 3. Relationship between Y and X controlling for C 4. Which of X's are most important? 5. Best mathematical model 6. Compare regression relationships: Y1 on X, Y2 on X 7. Assess interactive effects of X's

  5. Simple regression: one X • Multiple regression: two or more X's

  6. Simple linear regression Y = β0 + β1X + Error

  7. Assumptions of simple linear regression 1. Existence 2. Independence 3. Linearity 4. Homoscedasticity 5. Normality 6. X measured without error

  8. Assumptions of simple linear regression 1. For any fixed value of X, Y is a random variable with a certain probability distribution having finite mean and variance (Existence) Y Prob of Y X

  9. Assumptions of simple linear regression 2. The Y values are statistically independent of one another (Independence)

  10. Assumptions of simple linear regression 3. The mean value of Y given X is a straight line function of X (Linearity) Y Prob of Y X

  11. Assumptions of simple linear regression 4. The variance of Y is the same for all X (Homoscedasticity) Y Prob of Y X

  12. Assumptions of simple linear regression 5. For any fixed value of X, Y has a normal distribution • (Normality) Y Prob of Y X

  13. Assumptions of simple linear regression 6. There are no measurement errors in X (X measured without error)

  14. Assumptions of simple linear regression 1. Existence 2. Independence 3. Linearity 4. Homoscedasticity 5. Normality 6. X measured without error

  15. If assumptions hold, what can we do? 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty 2. Describe quality of fit (variation of data around straight line) by estimate of σ² or r² 3. Tests of slope and intercept 4. Prediction and prediction bands 5. ANOVA Table

  16. Parameters estimated using least-squares • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue) Find line which minimizes squares of residuals

  17. 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty • Age-specific pregnancy rates of female sperm whales (from Best et al. 1984 Rep. int. Whal. Commn. Spec. Issue)

  18. 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty • β0 = 0.230 (SE 0.028) • 95% c.i.: 0.164; 0.296 • β1 = -0.0035 (SE 0.0009) • 95% c.i.: -0.0056; 0.0013

  19. 2. Describe quality of fit by estimate of σ² or r² σ² = 0.0195 r2 = 0.679 r2 (adjusted)= 0.633 (Propn. variance accounted for by regression)

  20. 3. Tests of slope and intercept a) Slope = 0 {Equivalent to r=0} b) Slope = Predetermined constant c) Intercept = 0 d) Intercept = Predetermined constant e) Compare slopes f) Compare intercepts {Assume same slope} (tests use t-distribution)

  21. 3a) Slope = 0 {Equivalent to r=0} Does pregnancy rate change with age? H0: β1 = 0 H1: β1≠ 0 P=0.006 Does pregnancy rate decline with age? H0: β1 = 0 H1: β1 > 0 P=0.003

  22. 3b) Slope = Predetermined constant β1 = 2.868 (SE 0.058) 95% c.i.: 2.752; 2.984 Does shape change with length? H0: β1 = 3 H1: β1≠ 3 P<0.05 weight=length3 Weights and Lengths of Cetacean Species Whitehead & Mann In Cetacean Societies 2000

  23. 3c) Intercept = 0 β0 = 0.436 (SE 0.080) 95% c.i.: 0.276; 0.596 Is birth length proportional to length? H0: β0 = 0 H1: β0≠ 0 P=0.000

  24. 3d) Intercept = Predetermined constant ?

  25. 3e) Compare slopes β1 (m) = 2.528 (SE 0.409) β1 (o) = 2.962 (SE 0.094) Does shape change differently with length for odontocetes and mysticetes? H0: β1 (m) = β1 (o) H1: β1 (m) ≠ β1 (o) P = 0.146 Weights and Lengths of Cetacean Species Whitehead & Mann 2000

  26. 3f) Compare intercepts{Assume same slope} β0 (m) = 2.528 (SE 0.409) β0 (o) = 2.962 (SE 0.094) Are odontocetes and mysticetes equally fat? H0: β0 (m) = β0 (o) H1: β0 (m) ≠β0 (o) P = 0.781 15 10 Log(Weight) 5 ORDER m o 0 0 1 2 3 4 Log(Length)

  27. 4. Prediction and prediction bands 95% Confidence Bands for Regression Line 95% Prediction Bands From: http://www.tufts.edu/~gdallal/slr.htm

  28. 5. ANOVA Table Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression 286.27 1 286.27 2475.07 0.00 Residual 5.32 46 0.12

  29. If assumptions hold, what can we do? 1. Estimate β0 (intercept), β1 (slope), together with measures of uncertainty 2. Describe quality of fit (variation of data around straight line) by estimate of σ²or r² 3. Tests of slope and intercept 4. Prediction and prediction bands 5. ANOVA Table

  30. Expected Testing assumptions: diagnostics • Use residuals to look at assumptions of regression: e(i) = Y(i) - (β0 + β1X(i)) Observed

  31. Residuals • Residual: e(i) = Y(i) - (β0 + β1X(i)) • Standardized residuals: e(i)/S {S is the standard deviation of the residuals with adjusted degrees of freedom} • Studentized residuals: e(i) / [S(1 - h(i))] {h(i) is the "leverage value" of observation i: h(i) =1/n + (X(i) - ΣX(i)/n )²/[(n-1)S(X)²]} • Jackknifed residuals: e(i) / [S(-i) (1 - h(i))] {The residual variance (S(-i)) is calculated separately with each observation deleted}

  32. Use Residuals to: a) look for outliers which we may wish to remove b) examine normality c) check for linearity d) check for homoscedasticity e) check for some kinds of non-independence

  33. a) Using residuals to look for outliers

  34. Yes if “outlier” was probably not produced by the process being studied measurement error different species ... No if “outlier” was probably produced by the process being studied extreme specimen Should outliers be removed?

  35. b) Using residuals to examine normality • Lilliefors test for normality: P=0.62 • Lilliefors test for normality (excluding Bowhead whale): P=0.68

  36. c) Using residuals to check for linearity

  37. d) Use residuals to check for homoscedasticity

  38. e) Use residuals to check for some kinds of non-independence • Durbin-Watson D Statistic: 1.48 • low values (<2) indicate autocorrelation • First Order Autocorrelation: 0.26 Days spent following sperm whales

  39. Use Residuals to: a) look for outliers which we may wish to remove b) examine normality c) check for linearity d) check for homoscedasticity e) check for some kinds of non-independence

  40. Assumptions of simple linear regression 1. Existence 2. Independence 3. Linearity 4. Homoscedasticity 5. Normality 6. X measured without error

  41. When assumptions do not hold: 1. Existence: Forget it!

  42. When assumptions do not hold: 2. Independence: • collect data differently • reduce the size of the data set • add additional terms to the regression model • (e.g. autocorrelation term, species effect) More a problem for testing than prediction

  43. When assumptions do not hold: 3. Linearity: • Transform either X or Y or both variables. e.g.: Log(Y) = ß0+ ß1 Log(X) + E • Polynomial regression: Y = ß0 + ß1X + ß2X² + ... + E • Non-linear regression. e.g.: Y = c + EXP(ß0 + ß1X) + E • Piecewise linear regression: Y = ß0 + ß1X [X>XK] + E where [X> XK]=0 if X< XK and [X> XK]=1 if X> XK.

  44. Y = ß0 + ß1X [X>XK] + E • Log(Y) = ß0+ ß1 Log(X) + E • Y = ß0 + ß1X + ß2X² + ... + E • Y = c + EXP(ß0 + ß1X) + E

  45. Transformation to improve linearity

  46. When assumptions do not hold: 4. Homoscedasticity: • Transformations of the Y variable • Weighted regressions(if we know that some observations are more accurate than others)

  47. Y - transformation to improve homoscedasticity

  48. When assumptions do not hold: 5. Normality: • Transformations of the Y variable • Non-normal error structures (e.g. Poisson) Small departures from normality are not especially important, unless doing a test

  49. When assumptions do not hold: 6. X measured without error: • Major axis regression • Reduced major axis, or geometric mean, regression

  50. Major axis regression: • Minimize sum of squares of perpendicular distances from observations to regression line • Only if variables are in same units {First principal component of covariance matrix}

More Related