1 / 33

STAT131 Week 3 Lecture1b Correlation

STAT131 Week 3 Lecture1b Correlation. Anne Porter Email alp@uow.edu.au Phone: 42214058. Statistical Research Process. To come. Exploring & Describing Data Tools for Looking at Variation in Data Structures. Questions thus far. What is the shape of the data set (ie what is its distribution?)

yoko
Télécharger la présentation

STAT131 Week 3 Lecture1b Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT131Week 3 Lecture1bCorrelation Anne Porter Email alp@uow.edu.au Phone: 42214058

  2. Statistical Research Process

  3. To come Exploring & Describing DataTools for Looking at Variation in Data Structures

  4. Questions thus far • What is the shape of the data set (ie what is its distribution?) • What is the centre of the data set? • What is the spread of the data set? • Are there any outliers (unusual data) in the data set? • Do we need to transform the data in some manner? • New Questions • Is there a relationship between cholesterol and • cardiovascular disease? • Is there a correlation between intelligence and • performance in exams?

  5. Questions about relationship: How are the variables measured? • Is there a relationship between cholesterol and cardiovascular disease? • Is there a correlation between intelligence and performance in exams? For correlation the two variables are measured on a quantitative scale - either on interval or ratio scales

  6. “two quantitative variables”

  7. Height in Cm 3 2 1 0 1 2 3 4 5 6 Week Plot the height of a plant 1 to 6 weeks after planting Does one variable cause the other?

  8. Height in Cm 3 2 1 0 1 2 3 4 5 6 Week Plot the height of a plant 1 to 6 weeks after planting Does one variable cause the other?

  9. Causality Did time cause the plants to grow? • Relationships (Correlations) do not provide strong evidence of causality • Strong evidence of causality is provided through well designed experiments where there are different treatments No - Water, nutrients, sunshine, talking to the plants.. May all have caused the plants to grow.

  10. x x x x x x x x x x Pearson’s Correlation r : Properties • Measures strength and direction of a straight line relationship 1 • Maximum value of r is All the points fall on a straight line, an increase in one score is matched by an increase in the other • Minimum value of r is -1 All the points fall on a straight line, an increase in one score is matched by an decrease in the other

  11. Pearson’s r properties

  12. Just looking at B might suggest no relationship Take care with the domain when measuring relationships Y A B C X

  13. Pearson’s Correlation r : Properties No linear relationship • r=0 means r=0 does not mean there is no relationship just that it is not a linear relationship

  14. Method 1: Calculating r where Sums of squares

  15. An example: • Given the (x,y) pairs (0,0),(1,2),(2,4),(3,6),(4,4) • Is there a relationship between X and Y? • Step 1: Plot the data and identify... (1) if are any unusual data points (2) if the relationship appears to be linear (3) the approximate strength and direction of the relationship • Step 2: If there were outliers (1) look to see what happens if the outliers are removed. (2) look to see what happens if a transformation is used. • Step 3: Calculate r.

  16. Y 6 5 4 3 2 1 X X X 0 1 2 3 4 X Step 1: Plot the relationship • Given the (x,y) pairs (0,0),(1,2),(2,4),(3,6),(4,4) • What is the nature of the relationship between X and Y? r is positive with Y increasing as X increases r is between 0 and 1, Linear with one outlier OR a curve in data - Artificial too few points! X X

  17. Step 2: No outliers • Go on to calculate r • Using the formula below, what do we need to find?

  18. Step 3: Calculate r

  19. Step 3: Calculate r 0 1 4 9 16 0 4 16 36 16 0 2 8 18 16 10 30 16 72 44

  20. 10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula?

  21. 10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula? n=

  22. 10 30 16 72 44 Step 3: Calculate r What else is needed to put into the formula? = (16)2=256 =160 = (10)2=100 n = 5

  23. 10 30 16 72 44 = (16)2=256 = (10)2=100 Step 3: Calculate r =160

  24. r = 10 30 16 72 44 = (16)2=256 = (10)2=100 Step 3: Calculate r =160

  25. Step 3: Calculate r

  26. r = r = = Step 3: Calculate r 12 12 =0.832 10 x 20.8 208

  27. Method 2: SPSS : -correlation matrixr, n, p When you know how to work it out by calculator do it by SPSS. Do your assignment using SPSS.

  28. Pearson’s r Number of (x,y) pairs Probability of getting r of that size or more, for sample size 5, if indeed there were no correlation in population Method 2: Reading outputr, n, p

  29. Method 3: Calculating r where and

  30. Method 4: Calculating r where

  31. Questions about Relationships • Scatterplot • Approximate direction and strength • Linear or non linear • Outliers • Transformations if necessary to make linear • Calculate r to provide a measure • Comment as to the nature of the relationship found

  32. Video Clip Decisions through Data Unit, Tape 3, Unit 13 Examines correlation as a measure of similarity. Decisions through Data, Tape 3, Unit 14 Examines correlation as a means of providing measure s for other things difficult to measure Decisions through Data, Tape 4, Unit 16, The question of causation

  33. Fitting a line through the points on the scatterplot • Correlation provides a measure of strength of a relationship • When we want to describe the form of the relationship we determine the best equation for a line through the data points • Next lecture regression. • Finding the least squares regression line.

More Related