1 / 45

Section 4.1

Section 4.1. Scatter Diagrams and Linear Correlation. Scatter Diagram. Is a graph in which data pairs ( x , y ) are plotted as individual points on a grid with horizontal axis x and vertical axis y We call x the explanatory variable. We call y the response variable. Paired data.

Télécharger la présentation

Section 4.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 4.1 Scatter Diagrams and Linear Correlation

  2. Scatter Diagram • Is a graph in which data pairs (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y • We call x the explanatory variable. • We call y the response variable.

  3. Paired data • x = phosphorus concentration at inlet • y = phosphorus concentration at outlet

  4. Scatter Diagram Linear Correlation The general trend of the points seems to follow a straight line segment.

  5. Non-Linear Correlation

  6. No Linear Correlation

  7. High Linear Correlation Points lie close to a straight line.

  8. Moderate Linear Correlation

  9. Low Linear Correlation

  10. Perfect Linear Correlation

  11. Positive Linear Correlation

  12. Negative Linear Correlation

  13. Little or No Linear Correlation

  14. Questions Arising • Can we find a relationship between x and y? • How strong is the relationship? • The answer is that there is a mathematical measurement that describes the strength of the linear association between two variables. This measure is the sample correlation coefficient r.

  15. The Correlation Coefficient (r) • A numerical measurement that assesses the strength of a linear relationship between two variables x and y

  16. Properties of the Correlation Coefficient r • Also called the Pearson product-moment correlation coefficient, r is a unitless measurement between • 1 and 1. • That is 1 <r< 1.

  17. Properties of the Correlation Coefficient r • If r = 1, there is a perfect positive correlation.

  18. Properties of the Correlation Coefficient r • If r = 1, there is a perfect negative correlation.

  19. Properties of the Correlation Coefficient r • If r = 0, there is no linear correlation.

  20. Properties of the Correlation Coefficientr • Positive values of r imply that as x increases, y tends to increase.

  21. Properties of the Correlation Coefficient r • Negative values of r imply that as x increases, y tends to decrease.

  22. Properties of the Correlation Coefficient r • The closer r is to  1 or +1, the better a line describes the relationship between the two variables x and y. • The value of r does not change when either variable is converted to different units.

  23. Properties of the Correlation Coefficient r • The value of r is the same regardless of which variable is the explanatory variable and which variable is the response variable. In other words, the value of r is the same for the pairs (x, y) as for the pairs (y, x).

  24. Computing the Correlation Coefficient r • Obtain a random sample of n data pairs (x, y). • Using the data pairs, compute Σx, Σy, Σx², Σy², and Σxy. • Use the following formula:

  25. Example: Computingr

  26. Computingr Interpretation of r: An r value of 0.975 indicates a strong positive correlation between the variables x and y

  27. GUIDED EXERCISE In one of the Boston city parks, there has been a problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compile the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day. x 10 15 16 1 4 6 18 12 14 7 y 5 2 1 9 7 8 1 5 3 6

  28. GUIDED EXERCISE Cont. • Construct a diagram of x and y values. Plot the (x, y) • From the scatter diagram r will be negative. The general trend is that large x values are associated with small y values and vice versa. From left to right, the least-square line goes down

  29. GUIDED EXERCISE Cont. • Verify that Σx = 103, Σy = 47, Σ = 1347, Σ = 295, and Σxy = 343. Use calculator. • Compute r. Alternatively, find the value of r directly by using a calculator.

  30. Sample compared to Population Correlation • Sample correlation coefficient = r • Population correlation coefficient = ρ • ρ is the Greek letter rho.

  31. A Caution • The correlation coefficient measures the strength of the relationship between two variables. • A strong correlation does not imply a cause and effect relationship. • A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.

  32. Lurking Variable • A lurking variable is neither an explanatory nor a response variable. • A lurking variable may be responsible for changes in both x and y.

  33. ExampleCorrelation does not equal Causation! You were given the data the weight of cars in pounds with their highway gas mileage.  You found a linear regression equation and determined that your model was a good fit. Car Weight in Pounds Gas Mileage MPG 3489 28 3955 25 3345 27 3085 29 4915 18 4159 21 4289 20 3992 26

  34. Example cont.Correlation does not equal Causation! • So, you now state for the whole world to hear that heavier cars get less gas mileage.  Right??? • Not necessarily.  Your statement may be correct for this particular set of data, but it may not be a universal truth.  • It may also be true that the weight of the car has nothing to do with the gas mileage.  Perhaps some other factor is affecting the gas mileage. • Just because a correlation exists does not guarantee that the change in one of your variables is causing the change in the other variable.

  35. Example Cause-Effect Relationship During the months of March and April, the weekly weight increases of a puppy in New York were collected.  For the same time frame, the retail price increases of snowshoes in Alaska were collected. Weekly Data Collection The weight of a The retail price of Growing puppy in snowshoes in New York Alaska 8 pounds $32.45 8.5 $32.95 9 $33.45 9.6 $34.00 10.1 $34.50 10.7 $35.10 11.5 $35.63

  36. Example Cause-Effect Relationship cont. • The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase.  Of course this is not true! •      The moral of this example is:  "be careful what you infer from your statistical analyses."  Be sure your relationship makes sense.  Also keep in mind that other factors may be involved in a cause-effect relationship

  37. Scatter Plots (calc) • A scatter plot is a graph used to determine whether there is a relationship between paired data. • In many real-life situations, scatter plots follow patterns that are approximately linear.  If y tends to increase as x increases, then the paired data are said to be a positive correlation.  If y tends to decrease as x increases, the paired data are said to be a negative correlation.  If the points show no linear pattern, the paired data are said to have relatively no correlation.To set up a scatter plot:Clear (or deactivate) any entries in Y= before you begin. • 1.  Enter the X data values in L1.  Enter the Y data values in L2, being careful that each X data value and its matching Y data value are entered on the same horizontal line. 

  38. Scatter Plots cont. (calc) 2.  Activate the scatter plot.  Press 2nd STATPLOT and choose #1 PLOT 1.  Be sure the plot is ON, the scatter plot icon is highlighted, and that the list of the X data values are next to Xlist, and the list of the Y data values are next to Ylist.  Choose any of the three marks. 3.  To see the scatter plot, press ZOOM and #9 ZoomStat.  Hitting TRACE and right arrow will move along the data points. 4.  To turn the scatter plot off, when you are finished with this problem:Method 1:   Go to the Y= screen.  Arrow up onto the PLOT highlighted at the top of the screen. Press ENTER to turn it off.Method 2:   Go to STAT PLOT (above Y=).  Choose your PLOT location.  Arrow to OFF. Press ENTER to turn it off.

  39. Scatter Plots cont. (calc) • Follow-up:*  At this point, the graph may be observed for the existence of a positive, negative or no correlation    between the data.*  A line of best fit can be calculated “manually”.    1. Select two points that you feel would give a line that fits the data.     2. Using your knowledge of equations of lines and slope, write the equation of your line.    3. Enter this equation into Y1 and graph.    4. How well does the line “fit” the data?    5. Use your line to make predictions. • *  Or a line of best fit can be calculated "using the calculator".    See Line of Best Fit.

  40. Line of Best Fit (calc) • A line of best fit  (or "trend" line) is a straight line that best represents the data on a scatter plot.  This line may pass through some of the points, none of the points, or all of the points. • You can examine lines of best fit with:     1.  paper and pencil only     2.  a combination of graphing calculator and            paper and pencil     3.  or solely with the graphing calculator

  41. Line of Best Fit cont. (calc) • Example:  Is there a relationship between the fat grams and the total calories in fast food? • SandwichTotal Fat (g) Total Calories • Hamburger 9 260 • Cheeseburger 13 320 • Quarter Pounder 21 420 • Quarter Pounder with Cheese 30 530 • Big Mac 31 560 • Arch Sandwich Special 31 550 • Arch Special with Bacon 34 590 • Crispy Chicken 25 500 • Fish Fillet 28 560 • Grilled Chicken 20 440 • Grilled Chicken Light 5 300

  42. Line of Best Fit cont. (calc) Paper and Pencil Solution: 1.  Prepare a scatter plot of the data on graph paper. 2.  Find two points that you think will be on the "best-fit" line.  Perhaps you chose the points (9, 260) and (30,530).  Different people may choose different points.  3.  Calculate the slope of the line through your two points (rounded to three decimal places).

  43. Line of Best Fit cont. (calc) 4.  Write the equation of the line.  This equation can now be used to predict information that was not plotted in the scatter plot.  For example, you can use the equation to find the total calories based upon 22 grams of fat. Equation:                              Prediction based on 22 grams of fat: • Different people may choose different points and arrive at different equations.  All of them are "correct", but which one is actually the "best"?  To determine the actual "best" fit, we will use a graphing calculator.

  44. Line of Best Fit cont. (calc) Graphing Calculator Solution: 1.  Enter the data in the calculator lists.  Place the data in L1 and L2.  STAT, #1Edit, type values into the lists 2.  Prepare a scatter plot of the data.  Set up for the scatterplot. 2nd StatPlot - choose the first icon.   Choose ZOOM #9 ZoomStat.

  45. Line of Best Fit cont. (calc) 3.  Have the calculator determine the line of best fit.STAT → CALC #4 LinReg(ax+b)       Include the parameters L1, L2, Y1.         (Y1 comes from VARS → YVARS, #Function, Y1) You now have the values of a and b needed to write the equation of the actual line of best fit.            y = 11.73128088x + 193.8521475 4.  Graph the line of best fit.  Simply hit GRAPH. To get a predicted value within the window, hit TRACE, up arrow, and type the desired value.  The screen shows x = 22.

More Related