450 likes | 545 Vues
Learn how to quantify the association between variables using the Pearson correlation coefficient and deviation score method. Understand the limitations, assumptions, and implications of correlation studies in inferring causality.
E N D
Correlation&Regression Association & Prediction
Measuring association • Editorial and letter to the editor, Indianapolis Star re CDC data • Differing opinions regarding degree of association • How to quantify the association between two variables • ie Smoking deaths & tax • ie Smoking percent & tax • ie Smoking percent & smoking death
Breast feeding & IQ Smoking & Criminal Behavior Abortion & Crime Lot’s of Anecdotal & Clinical Relationships
Plot out the data The Scattergram Janet (756,3.8) John
Plot out the data The Scattergram Each point represents a pair of scores from a single subject (case)
Quantifying Relationships • Pearson: developed the technique • Pearson r • Pearson correlation coefficient • Pearson product-moment correlation coefficient • r
Correlation • Co rrelation: how score on one variable is related to score on another variable • More specifically • How relative performance on one variable is related to relative performance on another variable • ie How each score relates to its’ mean and variability
Quantify relationship to the mean: Deviation Score • X = independent variable • Y = dependent variable • X - X (score on one variable related to its mean; deviation score of X; x) • Y - Y (score on another variable related to its mean; deviation score of Y; y)
Calculation of r : deviation score method ( (Xi - X) (Yi -Y) ) r = [(Xi - X)2 * (Yi - Y)2]
Calculation of r : deviation score method ( Xi - X) Deviation score of X x Note: will be + or - for each case
Calculation of r : deviation score method ( Yi - Y) Deviation score of Y y Note: will be + or - for each case
Calculation of r : deviation score method (Xi - X) ( Yi - Y) Product of paired deviation scores Product of x and y xy Note: product will be + or - for each case
Calculation of r : deviation score method [(Xi - X) ( Yi - Y)] Sum of product of paired deviation scores Sum of xy Covariance Note: will be + or - depending on ALL of the individual cases!!!!
Calculation of r : deviation score method ( (Xi - X) (Yi -Y) ) r = (Xi - X)2 * (Yi - Y)2
r by deviation score method X=8 Y=8 20 20 20
r T1&T2 = 1.00Perfect Positive Relationshipsee scattergram next slide
T1 & T2 = 1.00 • perfect positive • T1 & T3 = -1.00 • perfect negative • T1& T4 = 0.00 • no relationship
Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Guess the correlation game
Possible values of r • Range from -1.00 to +1.00 • any value in between • closer the value to -1.00, stronger the - relationship between the two variables • closer the value to +1.00, stronger the + relationship between the two variables Just what does r value of +0.25 mean?
Factors limiting a PMCC • Homogenous group • subjects very similar on the variables • Unreliable measurement instrument/technique • measurements bounce all over the place) • Nonlinear relationship • Pearson's r is based on linear relationships • Ceiling or Floor with measurement • lots of scores clumped at the top or bottom...therefore no spread which creates a problem similar to the homogeneous group [skewed data set(s)]
Assumptions of the PMCC • Measures are approximately normally distributed • Check with frequency distribution • The variance of the two measures is similar (homoscedasticity) • check with scatterplot • The relationship is linear • check with scatterplot • The sample represents the population • Variables measured on a interval or ratio scale
Not Causation Only Association
Correlations and causality • Correlations only describe the relationship, they do not prove cause and effect • Correlation is a necessary, but not a sufficient condition for determining causality • There are Three Requirements to Infer a Causal Relationship…
Correlations and causality • A statistically significant relationship between the variables • The causal variable occurred prior to the other variable • There are no other factors that could account for the cause • Correlation studies do not meet the last requirement and may not meet the second requirement
Correlations and causality • If there is a relationship between A and B it could be because • A ->B • A<-B • A<-C->B
Smoking & LBP r = 0.45 Low Back Pain Smoking
Smoking & LBP r = 0.45 Low Back Pain Smoking ? Low Back Pain Smoking
Smoking & LBP r = 0.45 Low Back Pain ? Smoking Lifestyle factors ( ie strength)
Interpreting r • r is not a proportion. • r = 0.25 does not mean one quarter similarity between the variables • r = 0.50 does not mean one half similarity between the variables • r describes the co-variability of the variables
Coefficient of Determination • r2 : simply square the r value • What percentage of the variance in each variable is explained by knowledge of the variance of the other variable • what percentage of the variance within Y is predicted by the variance within X?
Coefficient of Determination • (Shared Variation) • Correlation Coefficient Squared • Percentage of the variability among scores on one variable that can be attributed to differences in the scores on the other variable • The coefficient of determination is useful because it gives the proportion of the variance of one variable that is predictable from the other variable
Notes about r2 • Coefficient of determination explains shared variance • therefore 1-r2 is unexplained • r = 0.70 gives about 50% explained variance (why???) • always calculate r2 to evaluate extent of the correlation
Use of Correlation • Reliability of a test/measure • relate test-retest scores • relate tester1 to tester2 • Validity of a test • HR and fitness (aerobic capacity) • Relate multiple dependent variables (do all measure the same construct?)
Cautions concerning r • Appropriate only for linear relationships (use Anxiety&Performance.sav) • Sensitive to range of talent • smaller range, lower r • Sensitive to sampling variation • smaller samples, more unstable • r calculated is not population r
Meyer et al, 2002 MSSE, 34:7, 1065-1070
Adachi et al, 2002. Mechanoreceptors in the ACL contribute to the joint position sense. Acta Orthop Scand, 73:2:330-334.
Click here for a web site to review correlation concepts introduced in this lecture