1 / 25

Introduction to Statistics

Introduction to Statistics. Correlation Chapter 15 April 23-28, 2009 Classes #27-28. Correlation. A statistical technique that is used to measure and describe a relationship between two variables For example: GPA and TD’s scored Statistics exam scores and amount of time spent studying.

Télécharger la présentation

Introduction to Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

  2. Correlation • A statistical technique that is used to measure and describe a relationship between two variables • For example: • GPA and TD’s scored • Statistics exam scores and amount of time spent studying

  3. Notation • A correlation requires two scores for each individual • One score from each of the two variables • They are normally identified as X and Y

  4. Three characteristics of X and Y are being measured… • The direction of the relationship • Positive or negative • The form of the relationship • Usually linear form • The strength or consistency of the relationship • Perfect correlation = 1.00; no consistency would be 0.00 • Therefore, a correlation measures the degree of relationship between two variables on a scale from 0.00 to 1.00.

  5. Assumptions • There are 3 main assumptions… • 1. The dependent and independent are normally distributed. We can test this by looking at the histograms for the two variables • 2. The relationship between X and Y is linear. We can check this by looking at the scattergram • 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line. • If the above 3 assumptions have been met, then we can use correlation and test r for significance

  6. Pearson r • The most commonly used correlation • Measures the degree of straight-line relationship • Computation: r = SP / (SSX)(SSY)

  7. Example 1 X 30 38 52 90 95 305 X2 900 1,444 2,704 8,100 9,025 22,173 Y 160 180 180 210 240 970 Y2 25,600 32,400 32,400 44,100 57,600 192,100 XY 4,800 6,840 9,360 18,900 22,800 62,700 (SX) (SX2) (SY) (SY2) (SXY)

  8. Example 1 SSX = SX2 - (SX)2 = 22,173 - 3052 = n 5 = 22,173 - 93025/5 = 22,173 - 18,605 = 3,568 SSY = SY2 - (SY)2 = 192,100 - 9702 = n 5 = 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920

  9. Example 1 SP = SXY - (SX)(SY) = n 62,700 - (305)(970) 5 = 62,700 - 295,850/5 = 62,700 - 59,170 = 3,530

  10. Example 1 • r = SP / (SSX)(SSY) = 3,530 / (3,568)(3,920) = 3,530 / 13,986,560 = 3,530 / 3,739.861 = .944

  11. Coefficient of Determination (r2) • The value r2 is called the coefficient of determination because it measures the proportion in variability in one variable that can be determined from the relationship with the other variable • For example: • A correlation of r = .42 (or r = - .42) means that r2 = .17 (or 17%) of the variability in the Y scores can be predicted from the relationship with the X scores

  12. Coefficient of Determination (r2) and Interpret:The coefficient of determination is r2 = .891. Education, by itself, explains 89.1% of the variation in voter turnout.

  13. Example 2 • A researcher predicts that there is a high correlation between years of education and voter turnout • She chooses Alamosa, Boston, Chicago, Detroit, and NYC to test her theory

  14. Example 2 • The scores on each variable are displayed in table format: • Y = % Turnout • X = Years of Education

  15. Scatterplot • The relationship between X and Y is linear.

  16. Make a Computational Table

  17. Find Pearson’s r and Interpret:

  18. Pearson’s r • Had the relationship between % college educated and turnout, r =.32. • This relationship would have been positive and weak to moderate. • Had the relationship between % college educated and turnout, r = -.12. • This relationship would have been negative and weak.

  19. Find the Coefficient of Determination (r2) and Interpret:

  20. Hypothesis Testing with Pearson • We can have a two-tailed hypothesis: Ho: ρ = 0.0 H1: ρ ≠ 0.0 • We can have a one-tailed hypothesis: Ho: ρ = 0.0 H1: ρ < 0.0 (or ρ > 0.0) • Note that ρ (rho) is the population parameter, while r is the sample statistic

  21. Find rcritical • See Table B.6 (page 537) • You need to know the alpha level • You need to know the sample size • See that we always will use:df = n-2

  22. Find rcalculated • See previous slides for formulas

  23. Make you decision… • rcalculated < rcritical thenRetain H0 • rcalculated > rcritical thenReject H0

  24. Always include a brief summary of your results: • Was it positive or negative? • Was it significant ? • Explain the correlation • Explain the variation • Coefficient of Determination (r2)

  25. Credits • http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review • http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1

More Related