1 / 34

Section 3.4 (Agresti & Franklin)

Section 3.4 (Agresti & Franklin). What are some cautions in Analyzing Associations?. Definitions.

venus
Télécharger la présentation

Section 3.4 (Agresti & Franklin)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 3.4 (Agresti & Franklin) What are some cautions in Analyzing Associations?

  2. Definitions • Association – exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. Correlation is a type of association, between two numeric variables. • Extrapolation – using a regression line to predict y values for x values outside the observed range of data. • Lurking variable – an unobserved variable that influences the association between the variables of primary interest.

  3. Definitions (cont) • Confounding variable – when two explanatory variables are both associated with a response variable, but are also associated with each other, we have confounding.

  4. Mean U.S. temperatures 1895-2002

  5. With trendline

  6. Extrapolation • The further we move from the observed data, the riskier extrapolation becomes.

  7. Regression Equation Regression Analysis: Temperature versus Year The regression equation is Temperature = 33.3 + 0.0100 Year Predictor Coef SE Coef T P Constant 33.257 4.645 7.16 0.000 Year 0.010024 0.002383 4.21 0.000 S = 0.772194 R-Sq = 14.3% R-Sq(adj) = 13.5%

  8. Temp graph: Broken into 3 pieces • 1895 – 1929 • Increasing at the mean rate (0.01 degree per year), but arguably just within reasonable limits • 1930 – 1976 • Decreasing at double the mean rate (0.02 degree per year) • 1977 – 2002 • Increasing at triple the mean rate (0.06 degree per year)

  9. 1895 - 1929

  10. 1895 – 1929 Regression Analysis: temp4 versus year4 The regression equation is temp4 = 31.5 + 0.0109 year4 Predictor Coef SE Coef T P Constant 31.50 23.67 1.33 0.192 year4 0.01088 0.01238 0.88 0.386 S = 0.739710 R-Sq = 2.3% R-Sq(adj) = 0.0%

  11. 1930 - 1976

  12. 1930 - 1976 Regression Analysis: temp2 versus year2 The regression equation is temp2 = 98.4 - 0.0233 year2 Predictor Coef SE Coef T P Constant 98.43 12.57 7.83 0.000 year2 -0.023314 0.006436 -3.62 0.001 S = 0.598467 R-Sq = 22.6% R-Sq(adj) = 20.9%

  13. 1977 - 2002

  14. 1977 - 2002 Regression Analysis: temp3 versus year3 The regression equation is temp3 = - 66.9 + 0.0604 year3 Predictor Coef SE Coef T P Constant -66.89 41.12 -1.63 0.117 year3 0.06038 0.02067 2.92 0.007 S = 0.790427 R-Sq = 26.2% R-Sq(adj) = 23.2%

  15. Take a look again

  16. Extrapolation is dangerous • Trends may change suddenly • Our models must be viewed with a question mark, though they give us food for thought. What if we really are rising at 0.06 degree per year? 100 years = 6 degrees, 1000 years = 60 degrees!!

  17. Influential Outliers

  18. Be cautious of influential outliers • An observation is influential when it has a large effect on the results of linear regression • Two conditions must hold for an observation to be considered influential • These conditions are • Its x-value is relatively high or low • The observation is an outlier

  19. Example: • Regression by eye applet – watch the effect when the influential point is moved about. • Contrast with other outliers that are not influential.

  20. Correlation does not imply causation

  21. Frequent errors • Inferring a cause is a simple, single cause of an effect, when in fact it is only a contributory cause • Confounding variables: • Two explanatory variables are confounding if they both correlate to the response, and also correlate to each other. • Lurking variables. Example: Hormone replacement therapy (HRT) and heart disease.

  22. Lurking variables • Lurking variables are those variables, usually unknown, with which correlations exist between the variables in question. • When a response and explanatory correlate with lurking variables, then the response may correlate with the explanatory. • Take away: correlation between two variables DOES NOT mean those variables are directly connected, and certainly does not imply causation.

  23. Do We Really Know What Makes Us Healthy? • Example New York Times article on epidemiology and Hormone Replacement Therapy (HRT) for women • 1985: the Nurses’ Health Study run out of the Harvard Medical School and the Harvard School of Public Health reported that women taking estrogen had only a third as many heart attacks as women who had never taken the drug. • ->Women were protected from heart attacks until they passed through menopause (estrogen bestowed the protection) • ->this became the basis of the therapeutic wisdom for the next 17 years.

  24. New York Times article on HRT (cont) • the Women’s Health Initiative concluded in 2002 that H.R.T. caused far more harm than good • Why? healthy-user bias (lurking variable) • http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html

  25. NYT (cont): Healthy User Bias • the problem is that people who faithfully engage in activities that are good for them — taking a drug as prescribed, for instance, or eating what they believe is a healthy diet — are fundamentally different from those who don’t. One thing epidemiologists have established with certainty, for example, is that women who take H.R.T. differ from those who don’t in many ways, virtually all of which associate with lower heart-disease risk: they’re thinner; they have fewer risk factors for heart disease to begin with; they tend to be more educated and wealthier; to exercise more; and to be generally more health conscious.

  26. Discovery of the lurking variable(s) • In 1987, Diana Petitti, an epidemiologist now at the University of Southern California, reported that she, too, had detected a reduced risk of heart-disease deaths among women taking H.R.T. in the Walnut Creek Study, a population of 16,500 women. When Petitti looked at all the data, however, she “found an even more dramatic reduction in death from homicide, suicide and accidents.” With little reason to believe that estrogen would ward off homicides or accidents, Petitti concluded that something else appeared to be “confounding” the association she had observed. “The same thing causing this obvious spurious association might also be contributing to the lower risk of coronary heart disease,” Petitti says today.

  27. Frequent errors (cont) • Inferring that a correlation of event A with event B means that event A CAUSES event B • Another missing (lurking) variable that correlates with both variables may be the culprit • One variable may be the cause and the other the effect, but sometimes the wrong variable is chosen for the cause.

  28. Which is the cause? • Women in 1985 taking HRT had lower heart attack rates • 17 years of HRT therapy followed, arguably causing 10’s of thousands of deaths among American women • Subsequent research found HRT increased risk of heart disease • Lurking variables were found • The most sophisticated statistics could not outweigh the lack of judgement by researchers (who were notably qualified) • Clinical trials (2002) have caused HRT therapy to be discredited for reducing heart attacks in women

  29. Another (short) example: • A recent article: Many agree that the decline of religion may be a cause of the decline of the family. But what if it’s the other way around? Mary Eberstadt speculates... (http://www.hoover.org/publications/policyreview/7827212.html)

  30. Correlation does not imply causation • The number of TV sets per person x and the average life expectancy y for the world’s nations. There is a high positive correlation. • Does this mean that we can improve the life expectancy of people in Rwanda by shipping them TV sets? • No – rich nations have longer life expectancy because they have better nutrition, clean water, and better health care (lurking variables) • No cause and effect between TV sets and life expectancy.

  31. Simpson’s Paradox • Is smoking actually beneficial to your health? • Proportions conditioned on Smoker variable: • See p. 132, Table 3.7 for raw numbers

  32. A lurking variable – age

  33. Simpson’s Paradox • The fact that the direction of an association between two variables can change after we include a third variable and analyze the data at separate levels of that variable.

  34. Take away (you too, Harvard researchers) • Be cautious about interpreting an association. Always be wary of lurking variables that may influence the association. • Correlation (association) does not imply causation. • N.B. : these are not mathematical errors, but errors in applying the mathematics. This is why the authors use the term “art” in their definition of statistics. It’s more than a science. Done well, it’s an art.

More Related