html5-img
1 / 67

Chapter 7 -Part 1

Chapter 7 -Part 1 . Correlation. Correlation Topics. Correlational research – what is it and how do you do “co-relational” research? The three questions: Is it a linear or curvilinear correlation? Is it a positive or negative relationship? How strong is the relationship?

hayes
Télécharger la présentation

Chapter 7 -Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7 -Part 1 Correlation

  2. Correlation Topics • Correlational research – what is it and how do you do “co-relational” research? • The three questions: • Is it a linear or curvilinear correlation? • Is it a positive or negative relationship? • How strong is the relationship? • Solving these questions with t scores and r, the estimated correlation coefficient derived from the tx and ty scores of individuals in a random sample.

  3. Correlational research – how to start. • To begin a correlational study, we select a population or, far more frequently, select a random sample from a population. • We then obtain two scores from each individual, one score on each of two variables. These are usually variables that we think might be related to each other for interesting reasons). We call one variable X and the other Y.

  4. Rho and, its estimate, r. • Since we use samples most of the time, for the most part, we will use the formulae and symbols for estimating the correlation in the population from a sample.) • The actual correlation in the population is called “rho” or “Pearson’s rho.” • Your best estimate or rho, derived from a random sample, is called “r” or Pearson’s r. • Pearson invented the technique.

  5. Comparing tX & tY scores to compute r • We translate the raw scores on the X variable to t scores (called tX scores) and raw scores on the Y variable to tY scores. • So each individual has a pair of scores, a tX score and a tY score. • You determine how similar or different the tX and tY scores in the pairs are, on the average, by subtracting tY from tX, then squaring, summing, and (kind of) averaging the tX and tY differences.

  6. The estimated correlation coefficient, Pearson’s r • With a simple formula, you transform the average squared differences between the tX & tY scores to Pearson’s correlation coefficient, r • Pearson’s r indicates (with a single number), both the direction and strength of the relationship between the two variables in your sample. • r also estimates the correlation in the population from which the sample was drawn • In Ch. 8, you will learn when you can use r that way.

  7. Going from pairs of raw scores to r: Linearity - A preliminary question. • Once you have scores on two variables, you ask, “Is this a linear or curvilinear relationship?” If you mistake a curvilinear relationship for a linear one and then use the correlation to predict values of Y from values of X, you can wind up predicting that the average 70 year old will be 13 feet tall! We don’t like making that kind of mistake. So you have to watch out for curvilinearity.

  8. Linearity vs. Curvilinearity • In a linear relationship, as scores on one variable go from low to high, scores on the other variable either generally increase or generally decrease. In a curvilinear relationship, as scores on one variable go from low to high, scores on the other variable change directions. They can go 1.)down and then up, 2.) up and then down, 3.) up and down and then up again, 4.) up or down then flat, 5.) and so on.

  9. Curvilinearity: An example • New furniture can be fairly expensive. Alternatively, it is hard to get very much for used furniture, unless it is very old. At some point such furniture, if in reasonably good condition, becomes a set of antiques and can be worth a good deal of money. • Thus the value of furniture goes from high to low, then, when enough time has passed from low to very high. • GRAPH that relationship. See how the line changes direction.

  10. Examples of linear relationships. • For example, think of the relationship of the size of a pleasure boat (X) and its cost (Y). As one variable (boat size) increases, scores on the other variable (cost) also increase. • Another example of a linear relationship: the relationship between the size of a car and the number of miles per gallon it gets. In general, as cars get gradually larger (X), they tend to get fewer miles per gallon (Y).

  11. A curvilinear relationship • In a curvilinear relationship, as scores on the X variable go gradually from low to high, the Y variable changes direction. • For example, think of the relationship between age (X) and height (Y). • As age increases from 0-14 or so, height increases also. • But then people stop growing. As age increases, height stays the same. • Thus the Y variable, height, changes direction. It goes from gradually rising to flat. • If you graph age and height, the best fitting line is a curved line.

  12. Correlation Characteristics: Which line best shows the relationship between age (X) and height (Y) Linear vs Curvilinear

  13. Baseball skill Terrible Very Poor Poor Average Good Very Good Excellent David Ben Ed Frank Chuck Al George Another non-linear relationship: shortstops and linemen: great shortstops may be too small to be great football lineman. Football potential Terrible Average Average Very Good Excellent Good Poor Is this a linear relationship?

  14. Plot the dots! • To check whether a relationship is linear, make a graph and place the scores on it. • That’s what I mean by “Plot the dots.” • If you really want to know what is going on with data, Plot the dots! • Here is a graph for the baseball skills and football potential data.

  15. Excellent Very Good Good Terrible Very Poor Poor Average Good Very Good Excellent Average Poor Very Poor Terrible When you plot the dots, is this linear? Football Skill * Chuck * Frank * Al Baseball Skill * Ben * Ed * George NO! It is best described by a curved line. It is a curvilinear relationship! * David

  16. After you know a correlation is linear, there are other two questions: Direction and Strength of a correlation. But first, a definition of high and low scores. • Definition of high and low scores: • High scores are scores above the mean. They are represented by positive t scores. • Low scores are scores below the mean of each variable. They are represented by negative t scores.

  17. Positive relationships • In a positive relationship, as X scores gradually increase, Y scores tend to increase as well. Example: The longer a sailboat is, the more it tends to cost. As length goes up, price tends to go up. • In a positive correlation, X and Y scores tend to be on the same side of their respective means. Scores below the mean on X are paired with scores below the mean on Y and scores above the mean on X tend to be paired with scores above the mean on Y. • As a result, the tX and tY scores tend to be similar and the difference between them (tX – tY) tends to be small. • Since (tX – tY) is small, the squared difference between them, (tX – tY)2 also tends to be small

  18. In a positive correlation, the tX and tY scores are relatively __________, so the difference and the squared difference between the t scores in each pair tends to be ________.

  19. In a positive correlation, the tX and tY scores are relatively similar, so the difference and the squared difference between the t scores in each pair tends to be small (or, to put it another way, close to zero).

  20. Graphing a positive relationship. • In a positive correlation high scores on X tend to go with high scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores also generally get higher. So, the line will tend to rise as it runs. • Remember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. • If a line rises from left to right, “rise” is positive. Run is always positive. So a positive rise divided by an (always) positive run results in a positive slope. (That’s why we call it a “positive” correlation.)

  21. 3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot

  22. 3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Graphic display of a strong POSITIVE correlation.

  23. Negative relationships • In a negative relationship, as X scores gradually increase, Y scores tend to decrease. Example: The larger a car is, the fewer miles it tends to get for each gallon of gas. As size goes up, miles per gallon tends to go down. • In a negative correlation, X and Y scores tend to be on opposite sides of their respective means. • As a result, the tX and tY scores tend to be dissimilar and the difference between them (tX – tY) tends to be large. • Since (tX – tY) is large, the squared difference between them, (tX – tY)2 also tends to be large.

  24. Graphing a negative relationship • In a negative correlation, high scores on X tend to go with low scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores get lower. So, the line will tend to fall as it runs. • Remember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. • If a line falls from left to right, “rise” is negative. Run is always positive. So a negative rise divided by an (always) positive run results in a negative slope. (That’s why we call it a “negative” correlation.)

  25. 3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot

  26. Summary: • When t scores are consistently more similar than different, we have a positive correlation. On a graph the dots will rise from your left to your right. So, a best fitting line will have a positive slope. • When t scores are consistently more different than similar, we have a negative correlation. On a graph the dots will fall from your left to your right. So, a best fitting line will have a negative slope.

  27. 3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot

  28. How strong is the relationship between the tX and tY scores? • Here the question is about the consistency with which tX and tY scores are either similar or dissimilar.

  29. t scores: sign and size • There are two aspects to the consistency of the relationship between tX and tY scores. • First, are the t scores consistently of the same sign (positive correlation) or opposite signs (negative correlation). • If they are almost always one way or the other, you have at least a moderately strong relationship. • On the other hand, if you sometimes see t scores on the same side of the mean and sometimes on opposite sides, you have a relatively weak correlation.

  30. t scores: sign and size • If there is a consistent pattern of same signed t scores (positive correlation) or a consistent pattern of opposite signed t scores (negative correlation), then whether the tX and tY scores are about the same distance from the mean comes into play. • The large majority of t scores (close to 90%), usually range from –1.50 to + 1.50 • Given a consistent positive or negative correlation, the more similar in size the t scores, the stronger the correlation. This is especially true at the extremes (t <-1.5 or t >1.5)

  31. Positive correlations: • Perfect: tX and tY scores are all the same sign and are identical in size. • Strong: tX and tY scores are almost all the same sign and are fairly similar in size. • Moderate: tX and tY scores are predominately the same sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. • Weak: tX and tY scores are a little more often the same sign than opposite in sign. Nothing can be said about size.

  32. Negative correlations: • Perfect: tX and tY scores are all of the opposite sign and are identical in size. • Strong: tX and tY scores are almost all of opposite sign and are fairly similar in size. • Moderate: tX and tY scores are predominately opposite in sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. • Weak: tX and tY scores are a little more often of opposite signs than the same in sign. Nothing can be said about size.

  33. Unrelated (independent) variables • When the size and sign of the tX scores bears no relationship to the size and sign of the tY scores, the variables are unrelated. • We also can call the variables “independent of” or “orthogonal to” each other. The three terms, unrelated, independent and orthogonal are synonymous in this context.

  34. Graphing it on taxes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. • Perfect - scores fall exactly on a straight line whose slope will be +1.00 or –1.00. • Strong - most scores fall near the line whose slope will be close to +.750 or -.750. • Moderate - some are near the line, some not. The slope of the line will be close to +.500 or -.500.

  35. Graphing it on taxes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. • Weak – some scores fall fairly close to the line, but others fall quite far from it. The slope of the line will be close to +.250 or -.250 • Independent - the scores are not close to the line and form a circular or square pattern. The best fitting line will be the X axis, a line with a slope of 0.000.

  36. 1.5 Perfect 1.0 0.5 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 0 -0.5 -1.0 -1.5 Strength of a relationship

  37. 3 2 Very Strong 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Strength of a relationship

  38. 3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Strength of a relationship Moderate

  39. 3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 Independent -3 Strength of a relationship

  40. Computing the correlation coefficient.

  41. Comparing apples to oranges? Use Z or t scores! • You can use correlation to look for the relationship between ANY two values that you can measure of a single subject. • However, there may not be any relationship (the variables may be independent). • A correlation tells us if scores are consistently similar on two measures, consistently different from each other, or have no real pattern

  42. Comparing apples to oranges? Use t scores! • To compare scores on two different variables, you transform them into ZX and ZY scores if you are studying a population or tX and tY scores if you have a sample. • ZX and ZY scores (or tX and tY scores) can be directly compared to each other to see whether they are consistently similar, consistently quite different, or show no consistent pattern of similarity or difference

  43. Comparing variables • Anxiety symptoms, e.g., heartbeat, with number of hours driving to class. • Hat size with drawing ability. • Math ability with verbal ability. • Number of children with IQ. • Turn them all into Z or t scores

  44. Pearson’s Correlation Coefficient • coefficient - noun, a number that serves as a measure of some property. • The correlation coefficient indexes BOTH the consistency and direction of a correlation with a single number

  45. rho – the population parameter • Pearson’s rho () is the parameter that characterizes the strength and direction of a linear relationship (and only a linear relationship) between two variables. To compute rho, you must have the entire population. Then you can compute sigma, mu, Z scores and rho. • The formula: rho= 1 -(1/2 (ZX - ZY)2 / (NP)) where NP is the number of pairs of Z scores in the population • In English: The correlation coefficient equals 1 minus half the average squared distance between the pairs of Z scores.

  46. Pearson’s rho • When you have a perfect positive correlation, the Z scores will be identical in size and sign. So the average squared distance will be zero and rho = 1.000-1/2(0.000) = 1.000

  47. Pearson’s rho • When you have a perfect negative correlation, the Z scores will be identical in size and opposite in sign. It can be proven algebraically that the average squared distance in that case will be 4.000: rho = 1.000-1/2(4.000) = -1.000

  48. Pearson’s rho • When you have two totally independent variables, the average squared distance will be 2.000 (halfway between 0.000 and 4.000). rho = 1.000-1/2(2.000) = 0.000

  49. Pearson’s Correlation Coefficient • Thus, rho varies from -1.000 (perfect negative correlation to 0.000 (independent variables) to +1.000 (perfect positive correlation). • A negative value indicates a negative relationship; a positive value indicates a positive relationship. • Values of r close to 1.000 or -1.000 indicate a strong (consistent) relationship; values close to 0.000 indicate a weak (inconsistent) or independent relationship.

More Related