1 / 29

Biostatistics in Practice

Biostatistics in Practice. Session 5: Associations and Confounding. Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/Biostat. Session 5 Preparation #1.

taite
Télécharger la présentation

Biostatistics in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/Biostat

  2. Session 5 Preparation #1 1. We often hear news reports of "seasonally adjusted unemployment rates". Can you think of a logical way that this adjustment could be made?

  3. Session 5 Preparation #2 From Table 3 Unadjusted What does “adjusted” mean? How is it done? Adjusted

  4. Goal One of Session 5 Earlier: Compare means for a single measure among groups. Use t-test, ANOVA. Session 5: Relate two or more measures. Use correlation or regression. ΔY/ΔX Δ Qu et al(2005), JCEM 90:1563-1569.

  5. Goal Two of Session 5 Try to isolate the effects of different characteristics on an outcome. Previous slide: Gender GH Peak BMI

  6. Visualize Y (vertical) by X (horizontal) scatter plot. • Pearson correlation, r, is used to measure association between two measures X and Y • Ranges from -1 (perfect inverse association) to 1 (perfect direct association) • Value of r does not depend on: • scales (units) of X and Y • which role X and Y assume, as in a X-Y plot • Value of r does depend on: • the ranges of X and Y • values chosen for X, if X is fixed & Y is measured Correlation

  7. Graphs and Values of Correlations

  8. Logic for Value of Correlation - + - + Σ(X-Xmean) (Y-Ymean) √Σ(X-Xmean)2Σ(Y-Ymean)2 r = Statistical software gives r.

  9. Correlation Depends on Ranges of X & Y A B Graph B contains only the graph A points in the ellipse. Correlation is reduced in graph B. Thus: correlations for the same quantities X and Y may be quite different in different study populations.

  10. Regression • Again: Y (vertical) by X (horizontal) scatterplot, as with correlation. See next slide. • X and Y now assume unique roles: • Y is an outcome, response, output, dependent variable. • X is an input, predictor, independent variable. • Regression analysis is used to: • Measure X-Y association, as with correlation. • Fit a straight line through the scatter plot, for: • Prediction of Y from X. • Estimation of Δ in Y for a unit change in X (slope = “effect” of X on Y).

  11. Regression Example MinimizesΣei2 ei Range for Individuals Range for mean Statistical software gives all this info.

  12. X-Y Association If slope=0 then X and Y are not associated. But the slope measured from a sample will never be 0. How different from 0 does a measured slope need to be in order to claim X and Y are associated? [ Side note: It turns out that slope=0 is equivalent to correlation r = 0. ]

  13. Test slope=0 vs. slope≠0, with the rule: Claim association (slope≠0) if tc=|slope/SE(slope)| > t ≈ 2. There is a 5% chance of claiming an X-Y association that really does not exist. Note similarity to t-test for means: tc=|mean/ SE(mean)| Formula for SE(slope) is in statistics books. X-Y Association

  14. Example Software Output The regression equation is: Y = 81.6 + 2.16 X Predictor Coeff StdErr T P Constant 81.64 11.47 7.12 <0.0001 X 2.1557 0.1122 19.21 <0.0001 S = 21.72 R-Sq = 79.0% Predicted Values: X: 100 Fit: 297.21 SE(Fit): 2.17 95% CI: 292.89 - 301.52 95% PI: 253.89 - 340.52 19.21=2.16/0.112 should be between ~ -2 and 2 if “true” slope=0. Refers to Intercept Predicted y = 81.6 + 2.16(100) Range of Ys with 95% assurance for: Mean of all subjects with x=100. Individual with x=100.

  15. Goal Two of Session 5 Try to isolate the effects of different characteristics on an outcome. Ethnicity Outcome Age

  16. Another Study Potential doping test for athletes. J Clin Endocrin Metab 2006 Nov; 91(11):4424-32.

  17. Study Goals: Outcomes are IGF-1 and Collagen Markers Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers. Figure 2. One conclusion is lack of differences between ethnic IGF-1 means, after adjustment for age, gender, and BMI (Fig 2). How are these adjustments made? * * for age, gender, and BMI.

  18. Adjustment: For a Single Continuous Characteristic We simulate data for Caucasians and Africans only for simplicity, to demonstrate attenuation of a 155-140=15 μg/L ethnic difference to a 160-157=3 μg/L ethnic difference. 160 158 140 155

  19. Adjustment: For a Single Continuous Characteristic • Problem: • Want to compare groups on IGF-1. • Groups to be compared (ethnicities) have different mean ages, and IGF-1 tends to decrease with age. • Solution: • Make groups appear to have the same mean age.

  20. Adjustment: For a Single Continuous Characteristic • Solution: Make groups appear to have the same mean age. • To do this, • Find regression line predicting IGF-1 from age. • Move each subject parallel to the regression line to the mean age. This is the expected IGF-1 if this subject had been at the mean age. • Adjusted means are means of these adjusted individual values.

  21. Adjustment: For a Single Continuous Characteristic We have just described a special case of multiple regression, in which an outcome is estimated by multiple predictors. Simple Regression: Estimated IGF-1 = intercept +slope(age) Multiple Regression: Estimated IGF-1 = intercept +slope(age) + diff(indicator) Indicator = 0 if African, 1 if Caucasian.

  22. Adjustment: For a Single Continuous Characteristic Software: Select Regression or Analysis of Covariance. Usually menu such as Output: Values of b0,b1,b2 for IGF1=b0+b1(age)+b2(indicator)

  23. Multiple Regression: Geometric View Multiple predictors may be continuous. Geometrically, this is fitting a slanted plane to a cloud of points: www.StatisticalPractice.com LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B12). LHCY = b0 + b1LCLC + b2LB12is the equation of the plane

  24. How Are Coefficients Interpreted? LHCY = b0 + b1LCLC + b2LB12 Outcome Predictors LB12 may have both an independent and an indirect (via LCLC) association with LHCY LCLC b1 ? Correlation LHCY b2 ? LB12

  25. Coefficients: Meaning of their Values LHCY = b0 + b1LCLC + b2LB12 Outcome Predictors LHCY increases by b2 for a 1-unit increase in LB12 … if other factors (LCLC) remain constant, or … adjusting for other factors in the model (LCLC) May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

  26. Output: Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t| Intercept 1.16448 0.28804 4.04 <.0001 AGE -0.00092 0.00125 -0.74 0.4602 BMI -0.01205 0.00295 -4.08 <.0001 BLC 0.05055 0.02215 2.28 0.0239 PRSSY -0.00041 0.00044 -0.95 0.3436 DIAST 0.00255 0.00103 2.47 0.0147 GLUM -0.00046 0.00018 -2.50 0.0135 SKINF 0.00147 0.00183 0.81 0.4221 LCHOL 0.31109 0.10936 2.84 0.0051 The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is: Log(HDL) = 1.16 - 0.00092(Age) +…+ 0.311(LCHOL) www. Statistical Practice .com

  27. HDL Example: Coefficients • Interpretation of coefficients on previous slide: • Need to use entire equation for making predictions. • Each coefficient measures the difference in expected LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is 0.012 lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors. Continued …

  28. HDL Example: Coefficients • Interpretation of coefficients two slides back: • P-values measure the association of a factor with Log(HDL) , if other factors do not change. • This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and called its independent association. • SKINF is probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.

More Related