Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
630 likes | 766 Vues
This lecture focuses on the fundamental concepts of correlation in the context of quantitative methods within social sciences. It delves into various research questions that explore the relationships between two variables, examples including crime incidence and outdoor temperature, pizza consumption and web surfing time, etc. Understanding correlation coefficients and how to interpret scatter plots are essential for evaluating these relationships. Additionally, the session covers covariance and the Pearson correlation coefficient (r) as a measure of linear relationships, illustrating their significance through real-world applications.
Basic Quantitative Methods in the Social Sciences (AKA Intro Stats)
E N D
Presentation Transcript
Basic Quantitative Methods in the Social Sciences(AKA Intro Stats) 02-250-01 Lecture 9
Assignment Due and Course Evaluations • All four modules of the assignment are due in the first 5 minutes of class. NO assignment will be accepted after 4:05 PM. • Course evaluations will be completed during the first 10 minutes of class.
Correlation • We are often interested in knowing about the relationship between two variables. • Consider the following research questions: • Does the incidence of crime (X) vary with the outdoor temperature (Y) in Detroit? • Does pizza consumption (X) have anything to do with how much time one spends surfing the web (Y)? • Does severity of depression (X) vary as a function of Ecstacy use (Y)? • Do the occurrence of pimples (X) increase as air pollution increases (Y) in Windsor?
Correlation • These are all examples of relationships. • In each case, we are asking whether one variable (X) is related to another variable (Y). Stated differently: Are X and Y correlated? • More specifically: Are changes in one variable reliably accompanied by changes in the other? • “Correlation coefficients” can be calculated so that we can measure the degree to which two variables are related to each other.
Scatter Plot Used to Describe Correlation • We can plot the X and Y points on a Scatter plot. • We plot the Y scores on the vertical axis and the X scores on the horizontal axis. • We then can draw a straight line to try to represent or describe the points on our scatter plot.
Graphing Relationships • When our height and weight scores are plotted, we see some irregularity. • We can draw a straight line through these points to summarize the relationship. • The line provides an average statement about change in one variable associated with changes in the other variable. r = .770
Correlation AGE WEIGHT
Imagine if…. • All of the dots fell exactly on the line? What would that mean? • All of the dots clustered close to the line, but few fell on the line – What would that mean? • The dots were widely dispersed around the line, such that the line is only a vague representation of how the scatterplot looks. What would that mean?
Correlation: Positive R • Lets look at some different scatter plots. • A positive relationship.
Correlation: Negative R • Lets look at some different scatter plots. • A negative relationship.
Correlation: No Relationship • Lets look at some different scatter plots. • No Relationship:
What Direction Relationship Is Described in This Scatter Plot?
Logic Dictates… • We can measure the distance between each dot and the line. • If a perfect correlation (1.000) is represented by all of the dots falling on the line, while a line whose dots vary around it indicates a weaker correlation… • The degree to which the two variables are correlated can be thought of as the mean distance between the dots and the line. This is calculated algebraically.
Covariance • Conceptually, the correlation between X and Y is based on covariance – a statistic representing the degree to which two variables vary together. • Like variance, covariance is based on deviations from the mean. • r is calculated as • But wait! Just like calculating variance, there is an easier formula
The Pearson Product-Moment Correlation Coefficient (r) • r is a quantitative expression of the degree to which two variables are correlated in a linear relationship. • Linear relationship: This means that the scatterplot points are clustered more or less symmetrically about a straight line, such that the line is an adequate representation of the relationship. • Non-linear or curvillinear relationship: The scatterplot points do not cluster around a straight line. Example? Arousal/performance
Characteristics of r • r has two components: • The degree of relationship • The direction of relationship • r ranges from –1.000 to +1.000
[ ] [ ] (SC)2 (SU)2 SC2 SU2 N N The Pearson r (SC) (SU) SCU N r = Note: This formula really is the same as the one in the book, just slightly rearranged.
We Need: • Sum of the Xs SC • Sum of the Ys SU • Sum of the Xs squared (SC)2 • Sum of the Ys squared(SU)2 • Sum of the squared Xs SC2 • Sum of the squared Ys SU2 • Sum of Xs times the Ys SCU • Number of Subjects (N)
[ ] [ ] (15)2 (17)2 55 63 5 5 The Pearson r (15) (17) 57 5 r =
[ ] [ ] (15)2 (17)2 55 63 5 5 The Pearson r 255 57 5 r =
[ ] [ ] (15)2 (17)2 55 63 5 5 The Pearson r 57 51 r =
[ ] [ ] (15)2 (17)2 55 63 5 5 The Pearson r 6 r =
[ ] [ ] 225 289 55 63 5 5 The Pearson r 6 r =
The Pearson r 6 r = [ ] [ ] 55 45 63 57.8
The Pearson r 6 r = [ ] [ ] 10 5.2
The Pearson r 6 r = 52
The Pearson r 6 r = 7.2111
The Pearson r .832 r =
Hypothesis Testing with Correlations • H0 = = 0 ( = “rho” – population correlation coefficient) • Ha = 0 (there is a significant relationship between X and Y) • Technically, you could do a one-tailed test for correlations ( <0 or >0), but for our purposes we will always test whether there simply is a relationship – therefore, we will always do a two-tailed test for correlations. • Find the critical value for .05 with df=n-2 (where N is the number of paired observations) in Table E.2 p. 440
The Pearson r .832 r = Is an r of .832 significant? See Table E.2 (p.440) for n - 2 df ( 5 - 2 = 3 df) and an alpha (a) of .05
The Pearson r .832 r = Is an r of .832 significant? The “Critical r” = .878 r = .832 Therefore, the correlation is NOT significant
Popcorn Consumption • Researcher X hypothesizes that popcorn consumption varies as a function of stress. He gives a random sample of 5 people a self-report measure of stress that produces scores ranging from 1 (little or no stress) to 10 (very stressed), and then has them watch a movie. He measures how many kernels of popcorn each of them eat. Is popcorn consumption correlated with stress?
Are X & Y Correlated? Stress Ratings # of Kernals
[ ] [ ] (SC)2 (SU)2 SC2 SU2 N N The Pearson r (SC) (SU) SCU N r =
We Need: • Sum of the Xs SC • Sum of the Ys SU • Sum of the Xs squared (SC)2 • Sum of the Ys squared(SU)2 • Sum of the squared Xs SC2 • Sum of the squared Ys SU2 • Sum of Xs times the Ys SCU • Number of Subjects (N)
[ ] [ ] (SC)2 (SU)2 SC2 SU2 N N The Pearson r (SC) (SU) SCU N r =
[ ] [ ] (29)2 (40)2 189 370 5 5 The Pearson r (29) (40) 256 5 r =
[ ] [ ] (29)2 (40)2 189 370 5 5 The Pearson r 1160 256 5 r =
[ ] [ ] (29)2 (40)2 189 370 5 5 The Pearson r 256 232 r =
[ ] [ ] (29)2 (40)2 189 370 5 5 The Pearson r 24 r =
[ ] [ ] 841 1600 189 370 5 5 The Pearson r 24 r =
The Pearson r 24 r = [ ] [ ] 189 168.2 370 320
The Pearson r 24 r = [ ] [ ] 20.8 50
The Pearson r 24 r = 1040
The Pearson r 24 r = 32.2490