200 likes | 376 Vues
LSP 121. Introduction to Correlation. Correlation. The news is filled with examples of correlation If you eat so many helpings of tomatoes… One alcoholic beverage a day… Driving faster than the speed limit… Women who smoke during pregnancy … Often, we can quantify correlation.
E N D
LSP 121 Introduction to Correlation
Correlation • The news is filled with examples of correlation • If you eat so many helpings of tomatoes… • One alcoholic beverage a day… • Driving faster than the speed limit… • Women who smoke during pregnancy… • Often, we can quantify correlation
How Do You Calculate Correlation in Excel? • Make an XY scatterplot of the data, putting one variable on the x-axis and one variable on the y-axis. • Select the two columns you wish to graph • Choose Insert Scatter • Insert a linear trendline on the graph and include the R2value • Click one of the data points on the chart • Right-click, choose Add Trendline, • Check boxes/buttons for: Linear, Display Equation, Display R2 • Interpret the results • Try it with CigarettesBirthweight.xls
Interpreting the Results • The higher the R2 value, the greater the likelihood that there is correlation • Crude estimate: R2> 0.5 • Most people say there is a correlation • R2< 0.3 • Most say correlation is essentially non-existent • R2 between 0.3 and 0.5? • Gray area – further analysis is needed • If you only have a few data points, then you need a higher R2 value in order to make a decision whether there is or is not a correlation
Examples: Are they correlated? • Look at: • CigarettesBirthweight.xls • SpeedLimits.xls (under Older Data) • HeightWeight.xls • Grades.xls (under Older Data) • WineConsumption.xls (under Older Data) • BreastCancerTemperature.xls
How Do We Calculate Correlation in SPSS/PASW? • In SPSS, click on Analyze -> Correlate -> Bivariate • Select the two columns of data you want to analyze (move them from the left box to the right box) • You can actually pick more than two columns, but we’ll keep it simple for now
How Do We Calculate Correlation in SPSS/PASW? • Make sure the checkbox for ‘Pearson Correlation Coefficients’ is checked • Click OK to run the correlation • You should get an output window something like the following slide
The correlation between height and weight is 0.861 The Pearson Correlation value is not the same as Excel’s R-squared value; it can be positive or negative
Positive and Negative Correlation • Positive correlation: as the values of one variable increase, the values of a second variable increase (values from 0 to 1.0) • Negative correlation: as the values of one variable increase, the values of a second variable decrease (values from 0 to -1.0)
Positive v.s. Negative Correlation • There is a negative correlation between TV viewing and class grades—students who spend more time watching TV tend to have lower grades (or, students with higher grades tend to spend less time watching TV). • There is a negative correlation between exercise and heart disease • There is a positive correlation between exercise and self-esteem
Positive and Negative Correlation on a graph Positive correlation Negative correlation
How would you classify these correlations? Negative correlation Positive correlation NO correlation
Positive and Negative Correlation • When looking for correlation, positive correlation is not necessarily greater than negative correlation • Which correlation is the greatest? -.34 .72 -.81 .40 -.12
** Correlation vs Causation • Correlation: Two concepts are related in some way. • Causation: Changing one of the factors also causes a change in the other factor. • eg: Smoking and Cancer are correlated. They also have a causal relationship. • If you do something to increase smoking, you increase the chance of cancer • eg: Ice cream sales and crime rates also have a correlation. However, they do NOT have a causal relationship. (Can you think why they are correlated?) • If you do something to increase ice cream sales, you do not see an increase in crime
What Can We Conclude? • If two variables are correlated, then we can predict one based on the other • But correlation does NOT imply causation! • It might be the case that having more education causes a person to earn a higher income. It might be the case that having higher income allows a person to go to school more. There could also be a third variable. Or a fourth. Or a fifth…
Causation (aka ‘Causality’) • Causation: One variable A, actually causes a change in B. • Here are some examples of correlations that also have a causality: • Increase smoking Increased likelihood of lung cancer • Increase exercise Decreased likelihood of heart disease • Key point: Many, many, many things in life have correlations. But this does not mean that they have causation. • See next slide
Correlation does NOT imply causation! • OFTEN (very often!), two items that are correlated are falsely assumed to have a causal relationship. • Usually, the reason for falsely assuming causation is the presence of a common underlying factor. That is, A may be correlated with B, but this is due to some other factor, C. • Example: None of these three correlations have a causal relationship. Can you identify the other factor? • As ice cream sales go up, so do crime rates • Summer! Crime always goes up in the summer. Not surprisingly, more people buy ice cream in the summer as well. • People who wear top-hats live longer (An actual study from the Victorian era) • Income. Wealthier people wear top hats and can also afford better health care, medicines, doctors, etc. • Hormone therapy for breast cancer decreases likelihood of heart disease • As with the previous example: socioeconomic status. Hormone therapy in of itself increases the likelihood of heart disease! However, people who are wealthier are more likely to have better general medical care resulting in early detection of breast cancer, proper treatments, etc. For this reason, they are also more likely to be more educated about heart disease (eat better, exercise more, smoke less, etc). So even though hormone therapy causes heart disease, on the whole, the majority of people on this therapy tend to have less heart disease.
Correlation v.s. Causation • Do not confuse correlation with causation. • Just because two things are correlated (e.g. height and weight) does not mean that there is a causal relationship. • In other words, making a change in A will predictably cause a change in B • Giving somebody a top-hat will not make them live longer (see next slide). • This is an example of where there is a correlation, but there is not causation. • Very important point – expect 1-2 exam questions on this idea!
What Can We Conclude? Sheer coincidence – the two variables have nothing in common, but they create a strong R or R2 value Both variables are changing over time – divorce rates are going up and so are drug-offenses. Is an increase in divorce causing more people to use drugs (and get caught)?